Help needed to speed up function

edited February 2005 in Development
Can anyone help me to re-factor this code snippet. The "MAP" section holds information about which tiles go where on the screen (handled by another function. The following code uses the "MAP" section to work out what colour each tile should be based on the "COLOUR" lookup. Each tile needs a 2x2 colour block. The following code does the job fine but I feel it could be sped up a bit. I haven't used z80 for many years and have forgotten some of the "tricks".


ORG 50000

LD HL,22528
LD IX,MAP

LD C,9
LOOP1 LD B,16

LOOP2 LD A,(IX+0)
LD DE,COLOUR
ADD A,E
LD E,A
LD A,(DE)
NOTDIRTILE PUSH HL
LD (HL),A
INC L
LD (HL),A
PUSH AF
LD A,31
ADD A,L
LD L,A
POP AF
LD (HL),A
INC L
LD (HL),A
POP HL
INC HL
INC HL
INC IX
DJNZ LOOP2
PUSH DE
LD DE,32
ADD HL,DE
POP DE
DEC C
JP NZ,LOOP1
RET

ORG 55040
COLOUR DEFB 20,30,40,50
MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
Post edited by Mr Millside on

Comments

  • edited February 2005
    The only thing that pops out at me currently is:

    PUSH DE ;+ 11
    LD DE,32 ;+ 10
    ADD HL,DE ;+ 15
    POP DE ;+ 10

    Total TStates = 46

    Which you can do via:

    LD A,32 ;+ 7
    ADD A,L ;+ 4 Add 32 to Low part first
    LD L,A ;+ 4
    LD A,0 ;+ 7
    ADC A,H ;+ 4 Add carry out to High part
    LD H,A ;+ 4

    Total TStates = 30

    Which avoids the need for the slower stacking operations.

    EDIT: Looking at the code again I can't actually see any reason for you to preserve DE, since you overwrite it anyway. Hence just remove the PUSH/POP DE and save 21 T States.

    [ This Message was edited by: cyborg on 2005-02-08 13:12 ]
  • edited February 2005
    Oh BTW, you may want to look at using the shadow register set and the EXX and EX AF,AF' instructions - since you can set up the registers before the loops and then utilise the extra registers to avoid the stacking operations you perform. I don't have time to do that right at the moment but I'll have a look at that later.
  • edited February 2005
    Few quick questions:

    Is there a good reason for separating the COLOUR and MAP structures? i.e. could you not get away with putting the colour data straight into the map. That would halve the number of lookups per cell, speeding things up a bit.

    Assuming it is necessary to separate them, is there any reason for not packing the MAP data structure? Each element only requires two bits, so you can probably reduce lookup overhead by packing data.
  • edited February 2005
    The map and colour data are stored as packed data. I expand a screen into the "MAP" and "COLOUR" structures before I display them. The reason I do this is that I need to modify the "standing" data depending on what has happened in the game. I also use the "modified" data in "MAP" to determine if the player has fallen off the edge of the tile or collected an item etc.
  • edited February 2005
    The two INC HL after eachother can also
    be altered by 2 INC L
    saving another 4 T-states per run.

    The POP HL always returns an even number so
    INC L will do. The second INC L reaches 256 after C reaches zero so HL must be preload with 22528-256 and after LOOP1 you must add INC H. You save 16 * 8 * 4 - 8 * 4 T States




    _________________
    Just POKE 23607,0 !
    Remember: beep <> Dr Beep !!!!

    [ This Message was edited by: Dr BEEP on 2005-02-08 13:33 ]
  • edited February 2005
    I have made the changes recomended so far (I think), so the code now looks like this:

    ORG 50000

    LD HL,22528
    LD IX,MAP

    LD C,9
    LOOP1 LD B,16

    LOOP2 LD A,(IX+0)
    LD DE,COLOUR
    ADD A,E
    LD E,A
    LD A,(DE)
    NOTDIRTILE PUSH HL
    LD (HL),A
    INC L
    LD (HL),A
    EX AF,AF'
    LD A,31
    ADD A,L
    LD L,A
    EX AF,AF'
    LD (HL),A
    INC L
    LD (HL),A
    POP HL
    INC L
    INC L
    INC IX
    DJNZ LOOP2

    LD A,32
    ADD A,L
    LD L,A
    LD A,0
    ADC A,H
    LD H,A

    DEC C
    JP NZ,LOOP1
    RET

    ORG 55040
    COLOUR DEFB 20,30,40,50
    MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
    DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
    DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
    DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
    DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
    DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
    DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
    DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
  • edited February 2005
    On 2005-02-08 13:27, Dr BEEP wrote:
    The two INC HL after eachother can also
    be altered by 2 INC L
    saving another 4 T-states per run.

    The POP HL always returns an even number so
    INC L will do. The second INC L reaches 256 after C reaches zero so HL must be preload with 22528-256 and after LOOP1 you must add INC H. You save 16 * 8 * 4 - 8 * 4 T States




    _________________
    Just POKE 23607,0 !
    Remember: beep <> Dr Beep !!!!

    [ This Message was edited by: Dr BEEP on 2005-02-08 13:33 ]

    I haven't tested the code, but HL will only be increased 256 times as I read it correctly. Making the INC H totally unnecessary. LD HL,22528 and 2 INC L will do

  • edited February 2005
    I see another win
                   ORG 50000
    
                   LD HL,22528
                   LD IX,MAP
        
                   LD C,9        
    LOOP1          LD B,16
    
    LOOP2          LD A,(IX+0)
                   LD DE,COLOUR
                   ADD A,E
                   LD E,A
                   LD A,(DE)
    NOTDIRTILE     LD (HL),A
                   INC L
                   PUSH HL 
                   LD (HL),A
                   EX AF,AF'
                   LD A,31
                   ADD A,L
                   LD L,A
                   EX AF,AF'
                   LD (HL),A
                   INC L
                   LD (HL),A
                   POP HL
                   INC L
                   INC IX
                   DJNZ LOOP2
    
    	       LD A,32
                   ADD A,L 
                   LD L,A 
                   LD A,0 
                   ADC A,H  
                   LD H,A 
    
                   DEC C
                   JP NZ,LOOP1
                   RET
    
                   ORG 55040
    COLOUR         DEFB 20,30,40,50
    MAP            DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
                   DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
                   DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    


  • edited February 2005
    On 2005-02-08 13:49, Dr BEEP wrote:
    I see another win and another
                   ORG 50000
    
                   LD HL,22528
                   LD IX,MAP
        
                   LD C,9        
    LOOP1          LD B,16
    
    LOOP2          LD A,(IX+0)
                   LD DE,COLOUR
                   ADD A,E
                   LD E,A
                   LD A,(DE)
    NOTDIRTILE     LD (HL),A
                   INC L
                   PUSH HL 
                   LD (HL),A
                   EX AF,AF'
                   LD A,31
                   ADD A,L
                   LD L,A
                   EX AF,AF'
                   LD (HL),A
                   INC L
                   LD (HL),A
                   POP HL
                   INC L
                   INC IX
                   DJNZ LOOP2
    
    	       LD A,32
                   ADD A,L 
                   LD L,A 
                   LD A,B      ; B always holds 0 
                   ADC A,H  
                   LD H,A 
    
    { this can also be done as follows }
          LD L,A
          JR NC, NOCAR
          INC H
    NOCAR DEC C
    { 255 * 12 + 1 * 8 = faster than 256 * 12 }
    
    
                   DEC C
                   JP NZ,LOOP1
                   RET
    
                   ORG 55040
    COLOUR         DEFB 20,30,40,50
    MAP            DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
                   DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
                   DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    



    [ This Message was edited by: Dr BEEP on 2005-02-08 14:02 ]
  • edited February 2005
    I ran the code and it seems to run OK.
  • edited February 2005
    On 2005-02-08 13:57, Mr Millside wrote:
    I ran the code and it seems to run OK.

    Which code did you run? I made another few adjustments.

    PUSH later, ADD with B-reg or JR NC test


  • edited February 2005
    On 2005-02-08 13:49, Dr BEEP wrote:
    I see another win

    Following on from that code:
                   ORG 50000
    
                   LD HL,22528
                   LD IX,MAP
        
                   LD C,9        
    LOOP1          LD B,16
    
    LOOP2          LD A,(IX+0)
                   LD DE,COLOUR
                   ADD A,E
                   LD E,A
                   LD A,(DE)
    NOTDIRTILE     LD (HL),A
                   INC L
                   PUSH HL 
                   LD (HL),A
                   EX AF,AF'
                   LD A,31
                   ADD A,L
                   LD L,A
                   EX AF,AF'
                   LD (HL),A
                   INC L
                   LD (HL),A
                   POP HL
                   INC L
                   INC IX
                   DJNZ LOOP2
    
    ; DE's previous value does not need to be preserved at all
    	       LD DE,32
                   ADD HL,DE 
    
                   DEC C
                   JP NZ,LOOP1
                   RET
    
                   ORG 55040
    COLOUR         DEFB 20,30,40,50
    MAP            DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
                   DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
                   DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    

    Now if you really want extra speed then you can unroll the loops.
  • edited February 2005
                   ORG 50000
    
                   LD DE,COLOUR
                   ADD A,E
                   LD E,A
                   LD A,(DE)
    
                   ORG 55040
    COLOUR         DEFB 20,30,40,50
    MAP            DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
                   DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
                   DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    

    DE holds the value 215 * 256
    The table can be read like this

    LD D,215
    LD E,(IX)

    Since D doesn't change you can preload D with 215 before C is loaded with 9


  • edited February 2005
                   ORG 50000
    
                   LD HL,22528
                   LD IX,MAP
        
                   LD C,9 
           
    LOOP1          LD B,16
                   LD D,COLOUR  256  ; = 215
               
    LOOP2          LD E,(IX+0)
                   LD A,(DE)
    NOTDIRTILE     LD (HL),A
                   INC L
                   PUSH HL 
                   LD (HL),A
                   EX AF,AF'
                   LD A,31
                   ADD A,L
                   LD L,A
                   EX AF,AF'
                   LD (HL),A
                   INC L
                   LD (HL),A
                   POP HL
                   INC L
                   INC IX
                   DJNZ LOOP2
    
                   LD DE,32
                   ADD HL,DE 
    
                   DEC C
                   JP NZ,LOOP1
                   RET
    
                   ORG 55040
    COLOUR         DEFB 20,30,40,50
    MAP            DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
                   DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
                   DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    

    Now if you really want extra speed then you can unroll the loops.

    [/quote]

    [ This Message was edited by: dr beep on 2005-02-08 14:36 ]
  • edited February 2005
    Will this work ???
                   ORG 50000
    
                   LD HL,22528
                   LD IX,MAP
        
    LOOP1          LD B,16
                   LD D,COLOUR  256  ; = 215
               
    LOOP2          LD E,(IX+0)
                   LD A,(DE)
    NOTDIRTILE     LD (HL),A
                   INC L
                   PUSH HL 
                   LD (HL),A
                   EX AF,AF'
                   LD A,31
                   ADD A,L
                   LD L,A
                   EX AF,AF'
                   LD (HL),A
                   INC L
                   LD (HL),A
                   POP HL
                   INC L
                   INC IX
                   DJNZ LOOP2
    
                   LD DE,32
                   ADD HL,DE 
    
                   JP NC,LOOP1
                   RET
    
                   ORG 55040
    COLOUR         DEFB 20,30,40,50
    MAP            DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
                   DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
                   DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    
  • edited February 2005
    This one doesn't work. It sets the border and various pixels so it seems to be looping accross the whole memory.

    ORG 50000

    LD HL,22528
    LD IX,MAP

    LOOP1 LD B,16
    LD D,COLOUR 256 ; = 215

    LOOP2 LD E,(IX+0)
    LD A,(DE)
    NOTDIRTILE LD (HL),A
    INC L
    PUSH HL
    LD (HL),A
    EX AF,AF'
    LD A,31
    ADD A,L
    LD L,A
    EX AF,AF'
    LD (HL),A
    INC L
    LD (HL),A
    POP HL
    INC L
    INC IX
    DJNZ LOOP2

    LD DE,32
    ADD HL,DE

    JP NC,LOOP1
    RET

    ORG 55040
    COLOUR DEFB 20,30,40,50
    MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
    DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
    DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
    DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
    DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
    DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
    DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
    DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
  • edited February 2005
    Then the one before it must work.

    (the one with still the C-reg)
  • edited February 2005
    You are correct. The routine before, with the "C" value, works a treat.
  • edited February 2005
    And this one?

                   ORG 50000
    
                   LD HL,22528
                   LD IX,MAP
        
                   LD C,9 
           
    LOOP1          LD B,16
                   LD D,COLOUR  256  ; = 215
               
    LOOP2          LD E,(IX+0)
                   LD A,(DE)
    NOTDIRTILE     LD (HL),A
                   INC L
                   LD E,L     ; save in E 
                   LD (HL),A
                   SET 5,L    ; in fact Add 32
                   LD (HL),A
                   DEC L      ; you added 32
                   LD (HL),A
                   LD L,E     ; restore value
                   INC L
                   INC IX
                   DJNZ LOOP2
    
                   LD DE,32
                   ADD HL,DE 
    
                   DEC C
                   JP NZ,LOOP1
                   RET
    
                   ORG 55040
    COLOUR         DEFB 20,30,40,50
    MAP            DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
                   DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
                   DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
                   DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
                   DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    
    

    Sorry if I am to enthusiastic
  • edited February 2005
    This new one work. I'll have to use some of these techniques in the code that draws the tiles to the screen.
  • edited February 2005
    On 2005-02-08 15:38, Mr Millside wrote:
    This new one work. I'll have to use some of these techniques in the code that draws the tiles to the screen.

    More than that, think logical.

    I noticed that your code goes from line 0 to line 2 etc...
    Therefore i know that bit 5 is low and when you add 32 you just set bit 5. This can be done faster by just setting that bit.

    Also choosing logical addresses for tables works fine. You choose a right value for COLOUR. Making the reading of the table much simpler. By filling the attribute clockwise in stead of left to right, up down I gained some speed as well. Then the final step is to eliminate all double routines ( the PUSH after INC L did that trick) and use faster codes where possible.

    Cyborg suggest unpacking the code, but that costs memory. I think this is a nice compromise between speed and memory.



    _________________
    Just POKE 23607,0 !
    Remember: beep <> Dr Beep !!!!

    [ This Message was edited by: Dr BEEP on 2005-02-08 15:55 ]
  • edited February 2005
    Cyborg suggest unpacking the code, but that costs memory. I think this is a nice compromise between speed and memory.

    Well that entirely depends on what the most important factor is - if fast screen updating is essential then in this case you are not sacraficing too much memory for unrolling 16 iterations in the innerloop. Reducing the looping overheads and being able to simplify some of the memory fetch code could save a significant number of T-States per cycle (given that indexing instructions are generally much slower) - times 16. I would estimate at least being able to shave off 16 Tstates giving at least 256 TState performance increase. This could make a huge difference to the performance of the code given that speed can be critical when dealing with the screen.
  • edited February 2005
    Okay, here's my version. Should be considerably faster, but haven't counted cycles (yet)
           org 50000
    
           ld de,22528
           ld hl,map
    
           ld b,9
    loop2  push bc
    
           push de
          
           ld b,16
    loop   push bc
           ld b,0d7h
           ld c,(hl) ; BC -> Colour value
           ld a,(bc)
           ld (de),a
           inc e
           ld (de),a
           inc de
           inc l
           pop bc
           djnz loop
    
           ex (sp),hl
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
    
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
           ldi
    
           pop hl
    
           pop bc
           djnz loop2
    
           ret
    
    
     oRG 55040 
    COLOUR DEFB 20,30,40,50 
    MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 
     DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 
     DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 
     DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1 
     DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1 
     DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 
     DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 
     DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 
     DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
    
    
Sign In or Register to comment.