Small demo to show my sprite engine for ZX Spectrum 48K

This demo shows 13 sprites moving at 50fps totally flicker free: 12 enemies bouncing in edges and the main character, controllable with OPSA (note S instead Q).

To show the demo dowload the zip and execute demo.tap (only works on 48K). The rest of archives are the sources. To discuss any technical question use this thread.

http://retrolandia.net/foro/attachment.php?aid=325
Post edited by antoniovillena on
«1

Comments

  • edited December 2013
    Looks nice. 13 sprites is certainly not bad. Well done. Better than what I would ever write, for sure.

    I also looked at some code. The map decompressor and the png decoding routines are interesting too.

    Which reminds me I've also written my own simple xor, not masked sprite routine and I only used it once. I should dig it up and use it some more.
  • edited December 2013
    Idk looks good enough c:
    Just one thing, isn't using LD:PUSH in a flip screen game a bit of an overkill?

    Have you tried doing it like this, say
    ;...
    	pop de
    	ld a,(hl)
    	ld (bc),a
    	inc bc
    	and e
    	or d
    	ld (hl),a
    	inc l
    	;...
    
    i.e. store the background graphics under each sprite separately, and then erase the sprites using that.
  • edited December 2013
    Very impressive.

    Although it's hard to me to imagine a real game where there would be such amount of stuff moving that fast with my job to avoid it all ;)

    Where do you do sprite pixels shifting when you move them left/right?
    I couldn't find it in the code at first glance.
  • edited December 2013
    How many sprites would you have to lose to make it full screen?
  • edited December 2013
    Timmy wrote: »
    Looks nice. 13 sprites is certainly not bad. Well done. Better than what I would ever write, for sure.

    I also looked at some code. The map decompressor and the png decoding routines are interesting too.

    Which reminds me I've also written my own simple xor, not masked sprite routine and I only used it once. I should dig it up and use it some more.

    All the code is open source, you can use it for your own project so please put me in the credits. So now I will cite all my inspiration sources.

    The map compressor (TmxCompress is inspired in Einar Saukas ZX7 compressor (cited in the comments). The map is based on Nightmare on Halloween from radastan and the sprites are extracted from mojon twins' Dogmole. The workflow organisation (png files, map, etc) is inspired on "La Churrera" engine.
  • edited December 2013
    Hikaru wrote: »
    Idk looks good enough c:
    Just one thing, isn't using LD:PUSH in a flip screen game a bit of an overkill?

    Have you tried doing it like this, say
    ;...
    	pop de
    	ld a,(hl)
    	ld (bc),a
    	inc bc
    	and e
    	or d
    	ld (hl),a
    	inc l
    	;...
    
    i.e. store the background graphics under each sprite separately, and then erase the sprites using that.

    Wow. It's exactly the code that I am thinking to use in the 128k version. The problem with 48k is the flickering, it's very difficult to avoid the electron beam while you are painting/erasing sprites.

    The solution adopted in this demo is (as you observed) a 40000 cycles flip screen every frame. This is almost 2/3 of the available time in the frame, so in the rest time we must paint the sprites. The key in this engine is a very fast routine that prints pre-shifted sprites in only 2000 cycles each.

    In the 128k version there won't be flip screen, and also I can put more sprites in a bigger screen area. I estimate about 1000 cycles to erase a sprite, so we have 3000 cycles to both paint/erase and a limit about 20 sprites on screen. The key in the 128k version will be switch between normal a shadow screen at the begining of the frame to avoid the flickering.
  • edited December 2013
    Ralf wrote: »
    Very impressive.

    Although it's hard to me to imagine a real game where there would be such amount of stuff moving that fast with my job to avoid it all ;)

    Where do you do sprite pixels shifting when you move them left/right?
    I couldn't find it in the code at first glance.

    Sprites are pre-shifted externally. I store in memory four rotate versions of each sprite (the movement is 2 pixel). The utility to do the transformation from sprites.png to sprite data is GfxBu.c, included in the source.
  • edited December 2013
    aowen wrote: »
    How many sprites would you have to lose to make it full screen?

    I would lose all the sprites. I lose 40000 cycles to flip 1/2 of screen area, so I need 80000 cycles to do it full screen. This is more than one frame.
  • edited December 2013
    By the way, are you going to make some own game with this engine or just stop here and leave the engine to others?

    As I said, it's hard for me to imagine a playable game where such a lot of stuff moves that fast on such a small area. It's eye hurting ;)

    But I'd really love to see someone use it wisely.
  • edited December 2013
    All the code is open source, you can use it for your own project so please put me in the credits. So now I will cite all my inspiration sources.
    You know how I'm usually pendantic about citing my sources too, but most of the time I ran out of memory and had to resort them in the accompanying txt file instead. I'm not ready for a new sprite engine yet. But I can see this (or the 128k version) being used in Churrera.
    The map compressor (TmxCompress is inspired in Einar Saukas ZX7 compressor (cited in the comments). The map is based on Nightmare on Halloween from radastan and the sprites are extracted from mojon twins' Dogmole. The workflow organisation (png files, map, etc) is inspired on "La Churrera" engine.
    I'm probably most impressed by LodePNG. I was looking for a simple PNG lib without using libPNG and zlib and this is really interesting. Great find!
  • edited December 2013
    Ralf wrote: »
    By the way, are you going to make some own game with this engine or just stop here and leave the engine to others?

    stop here and leave the engine to others

    Really I will release another engine to 128k and also "stop there and leave engine to others"

    I'm sorry. I am not a game maker.
    Ralf wrote: »
    As I said, it's hard for me to imagine a playable game where such a lot of stuff moves that fast on such a small area. It's eye hurting ;)

    But I'd really love to see someone use it wisely.

    The idea in the demo was put the maximum number of sprites in a frame, that is 13. Of course in a real game it's impossible because will be more code like collision detection, sound, etc...
  • edited December 2013
    Timmy wrote: »
    But I can see this (or the 128k version) being used in Churrera.

    I'll try with the 128k version. This one has a smaller action area (12x8 tiles). The Churrera has 15x10 tiles.
    Timmy wrote: »
    I'm probably most impressed by LodePNG. I was looking for a simple PNG lib without using libPNG and zlib and this is really interesting. Great find!

    Yes. Firstly I tried to use libPNG too and was painful.
  • edited December 2013
    Wow. It's exactly the code that I am thinking to use in the 128k version. The problem with 48k is the flickering, it's very difficult to avoid the electron beam while you are painting/erasing sprites.

    The solution adopted in this demo is (as you observed) a 40000 cycles flip screen every frame. This is almost 2/3 of the available time in the frame, so in the rest time we must paint the sprites. The key in this engine is a very fast routine that prints pre-shifted sprites in only 2000 cycles each.

    In the 128k version there won't be flip screen, and also I can put more sprites in a bigger screen area. I estimate about 1000 cycles to erase a sprite, so we have 3000 cycles to both paint/erase and a limit about 20 sprites on screen. The key in the 128k version will be switch between normal a shadow screen at the begining of the frame to avoid the flickering.

    It shouldn't be too difficult really. For a few ideas you may or may not find useful,

    1. If you plan to leave the floating bus synchronization in like in the demo, the simplest thing to do is catch the bottom edge of the screen - i.e. the border - erase stuff, draw stuff, in that order.
    If that doesn't give enough time still, leave a single 'black' character line (32x8) between the play area and the HUD, and fill the corresponding 'pixelspace' and attributes with a single appropriate value such as #C0, so that you could, more or less reliably, look for that line instead of the bottom edge, giving you a fair bit more drawing time (this is the method used in Cobra).

    2. It's just a little more tricky for traditional interrupt-driven engines. All you need to do is separate game object logic from drawing routines, designate 2-3 buffers for 'onscreen objects' (depending on how high your play area is - each buffer corresponding to a screen third), and basically make your logic code fill out these buffers, as it processes each game object in sequence, according to a simple sorting algorithm, e.g. by checking the high byte of a given screen address. Then as you erase/draw, process each buffer in order of appearance so to speak. For most cases, this is enough to make sure the graphic output never crosses the raster even as you begin modifying the screen from the beginning of an interrupt.
  • edited December 2013
    I found a small optimization in the mult8x8 macro:

    change this right at start:
            ld      d, 0
            ld      l, d
            add     hl, hl
    

    with:
            ld      hl, 0
            ld      d, l
    


    Also, this part:
            dec     b
            jr      nz, bfin
    

    I guess it can be replaced with "djnz bfin".

    I also found an instance of "ld de, hl", wich doesn't exist in Z80, but I guess sjasmplus accepts it. I think it's better to stick to the standard to make it easier to port to different assemblers.
  • edited December 2013
    Very interesting. I will try a mix of your two suggestions. Wait until a #c0 32x8 black line, and paint the sprites ordered by Y coordinate.

    Also I will put a limit to 8 sprites at once and instead of doing an API like SPI, do a memory array to manage like they were hardware sprites.
  • edited December 2013
    Metalbrain wrote: »
    I found a small optimization in the mult8x8 macro:

    change this right at start:
            ld      d, 0
            ld      l, d
            add     hl, hl
    

    with:
            ld      hl, 0
            ld      d, l
    


    Also, this part:
            dec     b
            jr      nz, bfin
    

    I guess it can be replaced with "djnz bfin".

    I also found an instance of "ld de, hl", wich doesn't exist in Z80, but I guess sjasmplus accepts it. I think it's better to stick to the standard to make it easier to port to different assemblers.

    Yes, you are right about ld de, hl. Fixed and also the other 2 optimisations, thank you. The newer version is always here:

    http://sourceforge.net/p/emuscriptoria/code/HEAD/tree/bu%C3%B1uelera/juego.asm
  • edited December 2013
    Another optimization for mult8x8 start code. Change this:
            ld      hl, 0
            ld      d, l
          IF  data & %10000000
            add     hl, de
          ENDIF
    

    to this:
          IF  data & %10000000
            ld      h, 0
            ld      l, e
          ELSE
            ld      hl, 0
          ENDIF
            ld      d, h
    

    PD: I've just seen more optimizations, so I'm working on a better macro right now...
  • edited December 2013
    Me too
        MACRO   mult8x8 data
          IF  data & %10000000
            ld      l, 0
            ld      d, l
            ld      l, e
          ELSE
            ld      hl, 0
            ld      d, l
          ENDIF
          IF  data & %01000000
            add     hl, de
          ENDIF
          IF  data & %11000000
            add     hl, hl
          ENDIF
          IF  data & %00100000
            add     hl, de
          ENDIF
          IF  data & %11100000
            add     hl, hl
          ENDIF
          IF  data & %00010000
            add     hl, de
          ENDIF
          IF  data & %11110000
            add     hl, hl
          ENDIF
          IF  data & %00001000
            add     hl, de
          ENDIF
          IF  data & %11111000
            add     hl, hl
          ENDIF
          IF  data & %00000100
            add     hl, de
          ENDIF
          IF  data & %11111100
            add     hl, hl
          ENDIF
          IF  data & %00000010
            add     hl, de
          ENDIF
          IF  data & %11111110
            add     hl, hl
          ENDIF
          IF  data & %00000001
            add     hl, de
          ENDIF
        ENDM
    
  • edited December 2013
    Final optimization for the macro: Don't add hl,hl while hl is zero.
        MACRO   mult8x8 data
          IF  data & %10000000
            ld      h, 0
            ld      l, e
            ld      d, h
            add     hl, hl
          ELSE
            ld      hl,0
            ld      d, h
          ENDIF
          IF  data & %01000000
            add     hl, de
          ENDIF
          IF  data & %11000000
            add     hl, hl
          ENDIF
          IF  data & %00100000
            add     hl, de
          ENDIF
          IF  data & %11100000
            add     hl, hl
          ENDIF
          IF  data & %00010000
            add     hl, de
          ENDIF
          IF  data & %11110000
            add     hl, hl
          ENDIF
          IF  data & %00001000
            add     hl, de
          ENDIF
          IF  data & %11111000
            add     hl, hl
          ENDIF
          IF  data & %00000100
            add     hl, de
          ENDIF
          IF  data & %11111100
            add     hl, hl
          ENDIF
          IF  data & %00000010
            add     hl, de
          ENDIF
          IF  data & %11111110
            add     hl, hl
          ENDIF
          IF  data & %00000001
            add     hl, de
          ENDIF
        ENDM
    

    PD: Within the same minute!... but yours seems buggy if the first bit is set.
  • edited December 2013
    Metalbrain wrote: »
    PD: Within the same minute!... but yours seems buggy if the first bit is set.

    Yes, mine is buggy. The correct is your routine.
  • edited December 2013
    I think that H register don't need be initialized in the first case:
        MACRO   mult8x8 data
          IF  data & %10000000
            ld      d, 0
            ld      l, e
            add     hl, hl
          ELSE
            ld      hl, 0
            ld      d, l
          ENDIF
          ..
    
  • edited December 2013
    I think that H register don't need be initialized in the first case

    Are you sure? I think its bit 0 will affect the final resulting bit 7. There are only 7 "add hl,hl" in the code.
  • edited December 2013
    I think that I found a painful method to syncronize with the beam in a desired location in the screen (on in a given tstate in the frame).

    It's not completly tested but seems to work. You must mark the location like this:
    5000: 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66
    5010: 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66
    ..
    5100: 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
    5110: 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
    ..
    5a00: 66 66 66 66 66 66 99 99 99 99 99 99 99 99 99 99
    5a10: 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
    

    And the code to syncronize is like this:
    repet   ld      de, $6699
    repet1  ld      b, 9
    repet2  in      a, ($ff)      ;11
            cp      d             ;4
            jp      nz, repet2    ;10   25
    repet3  in      a, ($ff)      ;11
            cp      e             ;4
            jr      z, repet4     ;7
            djnz    repet3        ;13   35
            jr      repet1
    repet4
    

    Example of working here:

    http://sourceforge.net/p/emuscriptoria/code/HEAD/tree/bu%C3%B1uelera/engine48.asm
  • edited December 2013
    Using the floating bus, that method won't work on +3/+2A models.
  • edited December 2013
    Metalbrain wrote: »
    Are you sure? I think its bit 0 will affect the final resulting bit 7. There are only 7 "add hl,hl" in the code.

    Again you are right, I thoght there were 8.
  • edited December 2013
    There's room for further improvement in mult8x8.

    If data is not zero, you should never start loading HL with zero. It should always start with value E.

    Also if there's only a single bit set in data, there's no need to initialize D since you will never need to execute ADD HL,DE.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited December 2013
    Metalbrain wrote: »
    Using the floating bus, that method won't work on +3/+2A models.

    In +2A/+3 there is no problem of flickering, we can syncronize at the begining of the interrupt and swap between normal and shadow video memory. The idea is first complete a 48k engine. Then make the 128k engine. And finally put them together (by detecting model and selecting the routines).
  • edited December 2013
    Not tested...
    MACRO   mult8x8 data
      IF  data = 0
        ld      hl, 0
      ELSE
        ld      h, 0
        ld      l, e
        IF  data [B]!=[/B] 1 [B]&&[/B] data [B]!=[/B] 2 [B]&&[/B] data [B]!=[/B] 4 [B]&&[/B] data [B]!=[/B] 8 [B]&&[/B] data [B]!=[/B] 16 [B]&&[/B] data [B]!=[/B] 32 [B]&&[/B] data [B]!=[/B] 64 [B]&&[/B] data [B]!=[/B] 128
          ld      d, h
        ENDIF
        IF  data & %10000000
          add     hl, hl
          IF  data & %01000000
            add     hl, de
          ENDIF
        ENDIF
        IF  data & %11000000
          add     hl, hl
          IF  data & %00100000
            add     hl, de
          ENDIF
        ENDIF
        IF  data & %11100000
          add     hl, hl
          IF  data & %00010000
            add     hl, de
          ENDIF
        ENDIF
        IF  data & %11110000
          add     hl, hl
          IF  data & %00001000
            add     hl, de
          ENDIF
        ENDIF
        IF  data & %11111000
          add     hl, hl
          IF  data & %00000100
            add     hl, de
          ENDIF
        ENDIF
        IF  data & %11111100
          add     hl, hl
          IF  data & %00000010
            add     hl, de
          ENDIF
        ENDIF
        IF  data & %11111110
          add     hl, hl
          IF  data & %00000001
            add     hl, de
          ENDIF
        ENDIF
      ENDIF
    ENDM
    
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited December 2013
    Thank you Einar. Works well, just changed <> by != for SjAsmPlus.
  • edited December 2013
    In +2A/+3 there is no problem of flickering, we can syncronize at the begining of the interrupt and swap between normal and shadow video memory. The idea is first complete a 48k engine. Then make the 128k engine. And finally put them together (by detecting model and selecting the routines).

    The floating bus does not work on all spectrum derivatives (eg the timex machines) and although a 128 back buffer method will work on those, at the expense of a second display file in 48k, I think there are probably other 48k derivatives that it won't work on. Has anyone looked at what kind of bus loading the 48k machine can handle before this method stops working reliably?

    As you can tell I don't like this trick :D But the engine is open so people are free to substitute their own method. I would personally prefer a fixed T state delay following an interrupt, possibly substituted by various fixed length tasks (eg drawing sprites) so it can be more than just a busy loop, but this is more work!
Sign In or Register to comment.