Help needed to speed up function
Can anyone help me to re-factor this code snippet. The "MAP" section holds information about which tiles go where on the screen (handled by another function. The following code uses the "MAP" section to work out what colour each tile should be based on the "COLOUR" lookup. Each tile needs a 2x2 colour block. The following code does the job fine but I feel it could be sped up a bit. I haven't used z80 for many years and have forgotten some of the "tricks".
ORG 50000
LD HL,22528
LD IX,MAP
LD C,9
LOOP1 LD B,16
LOOP2 LD A,(IX+0)
LD DE,COLOUR
ADD A,E
LD E,A
LD A,(DE)
NOTDIRTILE PUSH HL
LD (HL),A
INC L
LD (HL),A
PUSH AF
LD A,31
ADD A,L
LD L,A
POP AF
LD (HL),A
INC L
LD (HL),A
POP HL
INC HL
INC HL
INC IX
DJNZ LOOP2
PUSH DE
LD DE,32
ADD HL,DE
POP DE
DEC C
JP NZ,LOOP1
RET
ORG 55040
COLOUR DEFB 20,30,40,50
MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
ORG 50000
LD HL,22528
LD IX,MAP
LD C,9
LOOP1 LD B,16
LOOP2 LD A,(IX+0)
LD DE,COLOUR
ADD A,E
LD E,A
LD A,(DE)
NOTDIRTILE PUSH HL
LD (HL),A
INC L
LD (HL),A
PUSH AF
LD A,31
ADD A,L
LD L,A
POP AF
LD (HL),A
INC L
LD (HL),A
POP HL
INC HL
INC HL
INC IX
DJNZ LOOP2
PUSH DE
LD DE,32
ADD HL,DE
POP DE
DEC C
JP NZ,LOOP1
RET
ORG 55040
COLOUR DEFB 20,30,40,50
MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
Post edited by Mr Millside on
Comments
PUSH DE ;+ 11
LD DE,32 ;+ 10
ADD HL,DE ;+ 15
POP DE ;+ 10
Total TStates = 46
Which you can do via:
LD A,32 ;+ 7
ADD A,L ;+ 4 Add 32 to Low part first
LD L,A ;+ 4
LD A,0 ;+ 7
ADC A,H ;+ 4 Add carry out to High part
LD H,A ;+ 4
Total TStates = 30
Which avoids the need for the slower stacking operations.
EDIT: Looking at the code again I can't actually see any reason for you to preserve DE, since you overwrite it anyway. Hence just remove the PUSH/POP DE and save 21 T States.
[ This Message was edited by: cyborg on 2005-02-08 13:12 ]
Is there a good reason for separating the COLOUR and MAP structures? i.e. could you not get away with putting the colour data straight into the map. That would halve the number of lookups per cell, speeding things up a bit.
Assuming it is necessary to separate them, is there any reason for not packing the MAP data structure? Each element only requires two bits, so you can probably reduce lookup overhead by packing data.
be altered by 2 INC L
saving another 4 T-states per run.
The POP HL always returns an even number so
INC L will do. The second INC L reaches 256 after C reaches zero so HL must be preload with 22528-256 and after LOOP1 you must add INC H. You save 16 * 8 * 4 - 8 * 4 T States
_________________
Just POKE 23607,0 !
Remember: beep <> Dr Beep !!!!
[ This Message was edited by: Dr BEEP on 2005-02-08 13:33 ]
ORG 50000
LD HL,22528
LD IX,MAP
LD C,9
LOOP1 LD B,16
LOOP2 LD A,(IX+0)
LD DE,COLOUR
ADD A,E
LD E,A
LD A,(DE)
NOTDIRTILE PUSH HL
LD (HL),A
INC L
LD (HL),A
EX AF,AF'
LD A,31
ADD A,L
LD L,A
EX AF,AF'
LD (HL),A
INC L
LD (HL),A
POP HL
INC L
INC L
INC IX
DJNZ LOOP2
LD A,32
ADD A,L
LD L,A
LD A,0
ADC A,H
LD H,A
DEC C
JP NZ,LOOP1
RET
ORG 55040
COLOUR DEFB 20,30,40,50
MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
I haven't tested the code, but HL will only be increased 256 times as I read it correctly. Making the INC H totally unnecessary. LD HL,22528 and 2 INC L will do
ORG 50000 LD HL,22528 LD IX,MAP LD C,9 LOOP1 LD B,16 LOOP2 LD A,(IX+0) LD DE,COLOUR ADD A,E LD E,A LD A,(DE) NOTDIRTILE LD (HL),A INC L PUSH HL LD (HL),A EX AF,AF' LD A,31 ADD A,L LD L,A EX AF,AF' LD (HL),A INC L LD (HL),A POP HL INC L INC IX DJNZ LOOP2 LD A,32 ADD A,L LD L,A LD A,0 ADC A,H LD H,A DEC C JP NZ,LOOP1 RET ORG 55040 COLOUR DEFB 20,30,40,50 MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1 DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1 DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1[ This Message was edited by: Dr BEEP on 2005-02-08 14:02 ]
Which code did you run? I made another few adjustments.
PUSH later, ADD with B-reg or JR NC test
Following on from that code:
ORG 50000 LD HL,22528 LD IX,MAP LD C,9 LOOP1 LD B,16 LOOP2 LD A,(IX+0) LD DE,COLOUR ADD A,E LD E,A LD A,(DE) NOTDIRTILE LD (HL),A INC L PUSH HL LD (HL),A EX AF,AF' LD A,31 ADD A,L LD L,A EX AF,AF' LD (HL),A INC L LD (HL),A POP HL INC L INC IX DJNZ LOOP2 ; DE's previous value does not need to be preserved at all LD DE,32 ADD HL,DE DEC C JP NZ,LOOP1 RET ORG 55040 COLOUR DEFB 20,30,40,50 MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1 DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1 DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1Now if you really want extra speed then you can unroll the loops.
ORG 50000 LD DE,COLOUR ADD A,E LD E,A LD A,(DE) ORG 55040 COLOUR DEFB 20,30,40,50 MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1 DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1 DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1DE holds the value 215 * 256
The table can be read like this
LD D,215
LD E,(IX)
Since D doesn't change you can preload D with 215 before C is loaded with 9
ORG 50000 LD HL,22528 LD IX,MAP LD C,9 LOOP1 LD B,16 LD D,COLOUR 256 ; = 215 LOOP2 LD E,(IX+0) LD A,(DE) NOTDIRTILE LD (HL),A INC L PUSH HL LD (HL),A EX AF,AF' LD A,31 ADD A,L LD L,A EX AF,AF' LD (HL),A INC L LD (HL),A POP HL INC L INC IX DJNZ LOOP2 LD DE,32 ADD HL,DE DEC C JP NZ,LOOP1 RET ORG 55040 COLOUR DEFB 20,30,40,50 MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1 DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1 DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1Now if you really want extra speed then you can unroll the loops.
[/quote]
[ This Message was edited by: dr beep on 2005-02-08 14:36 ]
ORG 50000 LD HL,22528 LD IX,MAP LOOP1 LD B,16 LD D,COLOUR 256 ; = 215 LOOP2 LD E,(IX+0) LD A,(DE) NOTDIRTILE LD (HL),A INC L PUSH HL LD (HL),A EX AF,AF' LD A,31 ADD A,L LD L,A EX AF,AF' LD (HL),A INC L LD (HL),A POP HL INC L INC IX DJNZ LOOP2 LD DE,32 ADD HL,DE JP NC,LOOP1 RET ORG 55040 COLOUR DEFB 20,30,40,50 MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1 DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1 DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1ORG 50000
LD HL,22528
LD IX,MAP
LOOP1 LD B,16
LD D,COLOUR 256 ; = 215
LOOP2 LD E,(IX+0)
LD A,(DE)
NOTDIRTILE LD (HL),A
INC L
PUSH HL
LD (HL),A
EX AF,AF'
LD A,31
ADD A,L
LD L,A
EX AF,AF'
LD (HL),A
INC L
LD (HL),A
POP HL
INC L
INC IX
DJNZ LOOP2
LD DE,32
ADD HL,DE
JP NC,LOOP1
RET
ORG 55040
COLOUR DEFB 20,30,40,50
MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1
DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3
DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1
DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1
DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1
(the one with still the C-reg)
ORG 50000 LD HL,22528 LD IX,MAP LD C,9 LOOP1 LD B,16 LD D,COLOUR 256 ; = 215 LOOP2 LD E,(IX+0) LD A,(DE) NOTDIRTILE LD (HL),A INC L LD E,L ; save in E LD (HL),A SET 5,L ; in fact Add 32 LD (HL),A DEC L ; you added 32 LD (HL),A LD L,E ; restore value INC L INC IX DJNZ LOOP2 LD DE,32 ADD HL,DE DEC C JP NZ,LOOP1 RET ORG 55040 COLOUR DEFB 20,30,40,50 MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1 DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1 DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1Sorry if I am to enthusiastic
More than that, think logical.
I noticed that your code goes from line 0 to line 2 etc...
Therefore i know that bit 5 is low and when you add 32 you just set bit 5. This can be done faster by just setting that bit.
Also choosing logical addresses for tables works fine. You choose a right value for COLOUR. Making the reading of the table much simpler. By filling the attribute clockwise in stead of left to right, up down I gained some speed as well. Then the final step is to eliminate all double routines ( the PUSH after INC L did that trick) and use faster codes where possible.
Cyborg suggest unpacking the code, but that costs memory. I think this is a nice compromise between speed and memory.
_________________
Just POKE 23607,0 !
Remember: beep <> Dr Beep !!!!
[ This Message was edited by: Dr BEEP on 2005-02-08 15:55 ]
Well that entirely depends on what the most important factor is - if fast screen updating is essential then in this case you are not sacraficing too much memory for unrolling 16 iterations in the innerloop. Reducing the looping overheads and being able to simplify some of the memory fetch code could save a significant number of T-States per cycle (given that indexing instructions are generally much slower) - times 16. I would estimate at least being able to shave off 16 Tstates giving at least 256 TState performance increase. This could make a huge difference to the performance of the code given that speed can be critical when dealing with the screen.
org 50000 ld de,22528 ld hl,map ld b,9 loop2 push bc push de ld b,16 loop push bc ld b,0d7h ld c,(hl) ; BC -> Colour value ld a,(bc) ld (de),a inc e ld (de),a inc de inc l pop bc djnz loop ex (sp),hl ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi ldi pop hl pop bc djnz loop2 ret oRG 55040 COLOUR DEFB 20,30,40,50 MAP DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1 DEFB 2,1,1,1,2,1,1,2,0,1,0,1,2,3,1,1 DEFB 0,1,2,0,0,1,2,1,1,2,3,0,0,1,2,3 DEFB 2,0,0,1,2,1,1,2,3,0,0,1,2,3,1,1 DEFB 2,1,1,2,3,0,2,0,0,1,0,1,2,3,1,1 DEFB 2,1,1,2,3,2,0,0,1,0,0,1,2,3,1,1