Disappearing/appearing effect

edited April 2008 in Sinclair Basic
I remember doing a small routine in BASIC which could be used to cover the attributes section randomly, until all 704 characters were filled with the attribute, thus uncovering the mono screen beneath. The first version of my routine used RND and ATTR to paint randomly at x and y if x and y weren't painted already. A very bad approach, of course. It started fast, but it would slow down near the end.
My second version used a previously saved matrix and it worked fine, but I had to load my program and then the array.
My last option was to use DATA to store the matrix and avoid the extra block of data on tape. I also made experiments with game loading screens and loading the 768 bytes randomly.
I believe that there should be a way to optimize this routine and make it work without using random coordinates or stored data. Robocop, for example, used a similar routine, but with bytes instead of characters.
I know BASIC is slow and it should be compiled or done in assembler, but I would like to learn if it's possible and how it works.
I have this feeling that it will involve some complicated math function.

Thanks.
Post edited by zxbruno on
«1

Comments

  • edited March 2008
    One way that springs to mind (and it is by no means the most efficient way - but it works!) is to set up a 768 element array containing 1-768 to represent the attribute locations, then randomly swap some around before then reading the array sequentially. This is certainly how I have pretty effectively coded card shuffling routines (though there are only 52 cards to worry about there, of course)
  • edited March 2008
    That's quite close to the Donald-Knuth-approved method, but if you want to be picky about random distributions you can try:
    moving through the pack from top to bottom, swapping each card in turn with another card from a random position in the part of the pack that has not yet been passed through (including itself).
    (where 'card/pack' = 'element/array', obviously...)
  • edited March 2008
    I used the embedded array technique for a program I sent in for a Crash competition. That was over 20 years ago and I can't remember how I generated the numbers for the array - using RND, presumably. It's quite quick, but has the disadvantage that the screen is always resolved in the same sequence. It's the "Crash Compo" program on my Spot's Pourri page, if you're curious; the resolving routine starts at line 900.

    A much more recent program I did is "Random Plot" - also on that page - which plots all the screen pixels randomly in about 4 minutes. That could be modified to "plot" attribute squares instead. It's in machine code, but the method could be copied in BASIC. The BASIC equivalent is:
    10 LET r=INT (RND*45056): LET y=INT (r/256): LET x=r-y*256: PLOT x,y: GO TO 10

    This came about following the "Random plot not so random" thread on c.s.s. back in 2005. An earlier method I devised was:
    1 DEF FN r(c,d,p)=c+(p AND c+p<=d)-(p AND c+p>d AND c-p>=-d)
    10 PLOT FN r(RND*255,255,PEEK 23672),FN r(RND*175,175,PEEK 23672)
    20 GO TO 10
    which does look rather impressive, but I don't now know why (or how) I came up with something so impenetrable.

    Either of those BASIC methods could be modified to address the attributes area. The problem with them (and the m/c version) is that they're not "intelligent", so most points are addressed more than once, which extends the completion time substantially. (The DEF FN version takes about 80 minutes.) However, as the attributes area is only 1/8th the size of the display area and the slow PLOT command wouldn't be needed, then it could be speeded up sufficiently to make it useful.

    Another routine which might be of interest is Simon Goodwin's random maze generator in Runaway Robot. This is in BASIC and rather cleverly constructs a maze which can always be completed. I'd have thought that if the maze area was redefined to be the attributes area then it could be modified to do what you want, by having 768 different types of maze element. (I haven't thought this one through very far, so it may be a mad idea.) The maze construction starts at line 700.
  • edited March 2008
    PS.
    Which reminds me that many years ago I wrote a shuffling routine in COBOL, for generating a random map from a pre-defined set of tiles. The generated maps could also always be completed (it was for a SF treasure hunt game), but I think that was more to do with how I'd constructed the tiles than anything clever in the shuffling routine. I think I still have a listing of it somewhere, if you're interested.
  • edited March 2008
    I thought that using cyphers and relatively prime numbers on an alphabet size of 768 would give a nice pseudo-random fill but after testing it out, it only gave a regular fill.
    Sorry.
  • edited March 2008
    Have you considered this:

    Make two arrays. The first is a mixed up mapping of X coordinates (0..31) mixed up. The seconds is a mixed up mapping of Y coordinates (0..23) mixed up.

    Next get a routine that, although isn't random but does map numbers 0..767 to X and Y coordinates in a mixed up way.

    But instead of using the routine on its own, map the X,Y outputs through the two arrays mentioned above.

    You should end up with a very "random looking" fill which visits each cell once.
  • edited March 2008
    This is an implementation in BASIC. It is slow but proves it.

    www.real-sense.com/chris/zx/RandomFill.z80

    A machine code version (which I can write) will be as fast as you want AND would be about 150 bytes in length (including two arrays).
  • edited March 2008
    You don't have to shuffle up an array to visit each attr square once -- simply choose a random number from the remaining number of squares left to fill, call it the "nth", and then fill the "nth" unfilled square. This last bit will be slow in basic because you have to start at the beginning of the attr area and count unfilled squares, but it is acceptably fast in asm or C.

    Basic:
    5 let r=768
    10 let a=int(rnd*r)+1
    15 for x=22528 to 23294
    20 if peek x<>0 then let a=a-1
    25 if a=0 then go to 35
    30 next x
    35 poke x,0
    40 let r=r-1
    45 if r<>0 go to 10
    

    I assume the screen starts with all attributes not equal to 0, then attr squares are randomly set to 0.

    In C:
    #include <stdlib.h>
    
    int a, r;
    char *x;
    
    main()
    {
       r = 768;
       
       do
       {
          a = (int)((rand()*(long)(r)) / RAND_MAX)+1;
          
          for (x=22528; x!=23295; ++x)
          {
             if (*x != 0) --a;
             if (a == 0) break;
          }
          
          *x = 0;
          --r;
       } while (r != 0);
    }
    

    With snapshot:
    http://www.mediafire.com/download.php?h5bfiiddhaw

    Asm would probably be around 4x faster.
  • edited March 2008
    And here is a method that is much, much better.
    #include <sys/types.h>
    
    int galLFSR(int state)
    {
       static int lfsr;
       
       if (state)
          lfsr = state;
       else
          lfsr = (lfsr ^ (-(lfsr & 1) & 0x0408)) / 2;   // taps at 10,7,1
          
       return lfsr;
    }
    
    main()
    {
       int val;
       
       galLFSR(1);                                 // set initial state to 1
       
       do
       {
          val = galLFSR(0);
          if (val <= 768)
             *((uchar *)(22527 + val)) = 0;
       } while (val != 1);                         // stop when we loop back to 1 again
    }
    

    with tap file here:
    http://www.mediafire.com/download.php?mdi9wfzhyxf

    It uses a 10-bit maximal Galois LFSR (linear feedback shift register) to step through all non-zero 10-bit values in a pseudo-random way. The program gets a 10-bit value "val" from the LFSR and if it's <= 768 uses it as a valid displacement into the attributes area. With LFSRs, each value is only visited once.

    More info on LFSRs at wikipedia:
    http://en.wikipedia.org/wiki/LFSR

    The only worthwhile link from wikipedia that explains LFSRs properly:
    http://www.newwaveinstruments.com/resources/articles/m_sequence_linear_feedback_shift_register_lfsr.htm

    I thought the simplest maximal 10-bit LFSR with taps at 10,7,1 wouldn't "look" random enough but it actually seems to. There are other maximal 10-bit taps that involve more bits that can be used to perhaps change the random "look".
  • edited March 2008
    Wow AA, that is a whole area of logic that I've not come across.
    And your implementation produces a great result.

    How wide would the bits need to be to do a pixel version of this routine?
    I guess:
    No. of pixels = 256x192 = 49152 = 1.5 * 2^15, therefore, your internal state would have to be at least 15 bits long.

    Then there is the bitmapping:
    3 bits for the inter-byte position;
    5 bits for the x char position; and
    ~8 bits for the y pixel position.

    How do you decide on the "tap" and how do you know whether a given tap will produce a long enough sequence before repeating?
  • edited March 2008
    I know this has wandered into the fantastic realms of z80 m/c, but I thought I'd have a stab at this in good ol' BASIC.

    It fills attributes with 255 - for a real-life run, you'd PEEK from a screen's attrs loaded into higher memory.

    This routine requires 1536 bytes for the setup code, and doesn't run too swiftly, but it does work. The main drawback is the 14 second delay while the arrays are being set up.

    Any BASIC gurus that fancy a challenge could have a bash at speeding it up if they liked :)
    10 DIM a$(768): 
       DIM b$(768): 
       FOR f=1 TO 256: 
       LET a$(f)=CHR$ (f-1): 
       LET a$(f+256)=CHR$ (f-1): 
       LET a$(f+512)=CHR$ (f-1): 
       LET b$(f)=CHR$ (0): 
       LET b$(f+256)=CHR$ (1): 
       LET b$(f+512)=CHR$ (2): 
       NEXT f: 
       LET c=768
    20 IF c>0 THEN LET r=INT (RND*c)+1: 
       POKE CODE a$(r)+22528+256*CODE b$(r),255: 
       LET a$=a$( TO r-1)+a$(r+1 TO ): 
       LET b$=b$( TO r-1)+b$(r+1 TO ): 
       LET c=c-1: 
       GO TO 20
    

    It's just something that came to me suddenly, so is likely to be the most sub-optimal solution there is for BASIC. Anyone like to make any advances on this?

    D.
  • edited March 2008
    Impressive!
    The math and asm part is way over my head, but I will definitely keep and share this information with others.
  • edited March 2008
    BloodBaz wrote: »
    How wide would the bits need to be to do a pixel version of this routine?
    I guess:
    No. of pixels = 256x192 = 49152 = 1.5 * 2^15, therefore, your internal state would have to be at least 15 bits long.

    Yes, 49152 different pixels, so at least that many states which means you'd need a 16-bit shift register.
    How do you decide on the "tap" and how do you know whether a given tap will produce a long enough sequence before repeating?

    Well I look up the taps in a table like all the other non-mathematicians out there :p They're important in data communication and cryptography circles so there are many tables of taps available around the web.

    The results come out of finite field mathematics, which can be used to determine all the maximal length taps for any n-bit shift register. Other taps, of course, will result in smaller cycles but they are much less desireable as their noise and random characteristics are much less ideal.

    At the end of this page:
    http://www.newwaveinstruments.com/resources/articles/m_sequence_linear_feedback_shift_register_lfsr.htm
    you'll find some links showing all the maximal taps for variously-sized shift registers.

    I tried out a 16-bit LFSR for plotting pixels using this program:
    #include <graphics.h>
    
    unsigned int galLFSR16(unsigned int state)
    {
       static unsigned int lfsr;
       
       if (state)
          lfsr = state;
       else
          lfsr = (lfsr/2) ^ (-(lfsr & 1) & 0x8805);  // taps at 16, 15, 13, 4, 1
          
       return lfsr;
    }
    
    main()
    {
       unsigned int val;
       
       galLFSR16(1);                               // set initial state to 1
       
       do
       {
          val = galLFSR16(0);
          if (val <= 49152)
             plot(val & 0xff, val / 256);
       } while (val != 1);                         // stop when we loop back to 1 again
    }
    

    with tap here:
    http://www.mediafire.com/download.php?tcnga3zyyhj

    I had to change the LFSR calculation a little as it's done on a 16-bit value inside a 16-bit variable. The idea behind the LFSR calculation is very simple: (-(lfsr&1)) creates all 1s if bit 0=1 or all 0s if bit 0=0. The mask 0x8805 selects which bits get XORed into the original lfsr>>1 (and thus determines the taps). The 0x8000 in the mask implements the loop from bit 0 to bit 15.

    It's quite quick -- I imagine an m/c version would be very short and perhaps 6x faster or so. And maybe much faster than that if you pull bits out of the LFSR to create a screen address directly as you hinted. If you try it I wouldn't mind seeing the result!

    The only downside is you have to visit all 65535 states of the 16-bit LFSR, which means 65535-49152/65535 = 25% of them are wasted. Thankfully the LFSR calculation is very simple in C and moreso in m/c so this isn't too costly!
  • edited March 2008
    It's also possible to do the LFSR thing in basic:
     5 REM 10-bit Galois LFSR with taps at 10, 7, 1
    10 DEF FN l(v,a,b) = INT ((v + (1016 AND a AND b) + (1032 AND NOT a AND b))/2)
    15 REM deterimine whether bit log2(n) is set, true if it is
    20 DEF FN t(x,n) = INT(x/n/2)<>INT(x/n)/2
    30 LET val = 1
    40 LET val = FN l(val,FN t(val,8),FN t(val,1))
    50 IF val <= 768 THEN POKE val+22527,0
    60 IF val <> 1 THEN GO TO 40
    

    although it's not as fast as the shuffled arrays.
  • edited March 2008
    They're important in data communication and cryptography circles so there are many tables of taps available around the web.

    They were quite useful in some emulators' AY white noise implementation too :)
  • edited March 2008
    OK AA,

    Let me try to get a M/C version of your C version.
    I will use the same TAP as you have. I won't be "decoding" the Y-pixel value - treat the screen as a solid 6144 (1800h) sequence.

    Taking the 16-bit value:

    If bit 15 and 14 are set then discard (the 25% waste that you mentioned)
    This leaves bits 15 and 14 as one of { 00, 01 or 10 }.
    The more bits will provde the inter-byte pixel to set.
    The remaining 11 bits define (0..2047 ) which defines the byte within the 3rd screen to set.

    {busy going away to work it all out}
  • edited March 2008
    {closing ftp connection}

    OK and here it is!!!
    I have even thrown in a "border/sound effect" version also!

    I make this assembly version run in about 3.7 seconds! (37 seconds for a straight run of 10).

    I do however gain by:
    a) Disabling interrupts
    b) Using JP rather than JR
    c) Minimising CALLs and using the BC register for register loads (LD A, B) rather than immediate loads (LD A, n)

    Source Code
    http://www.real-sense.com/chris/zx/PixelLFSR.asm

    Comple Listing (Program = 80 bytes)
    http://www.real-sense.com/chris/zx/PixelLFSR.lst

    Sample
    http://www.real-sense.com/chris/zx/PixelLFSR.z80

    Sample with border/sound effect
    http://www.real-sense.com/chris/zx/PixelLFSRwithEffect.z80

    Enjoy!
  • edited March 2008
    Here's the code for my white noise routine as used when you hit the submarine in Sub Chase (CGC entry) and way back in Willy Tournament 2004:
    ; bc = duration, d = pitch
    
    Play_Noise        ld    e,d
    _noiseloop        call  RandomBit
                      rla
                      rla
                      rla
                      rla
                      out   (254),a
    _noisedelay       dec   e
                      jr    nz,_noisedelay
                      ld    e,d
                      dec   bc
                      ld    a,b
                      or    c
                      jr    nz,_noiseloop
                      ret
    
    ; uses hl, de, a
    RandomBit:        ld    hl,(RandomSeed)
                      ld    a,h
                      rlca
                      ld    d,a
                      rlca
                      rlca
                      ld    e,a
    
                      ld    a,l
                      xor   d
                      xor   e
                      srl   a
                      rr    h
                      rr    l
                      ld    (RandomSeed),hl
                      ld    a,0
                      adc   a,0
                      ret
    
    RandomSeed:       dw    1  ; needs an initial single bit set
    

    Nothing in common with the screen routines but it's still related to LFSRs and might be useful for someone :p
  • edited March 2008
    Hehe. There's actually a bug in that code in my last post. Specifically, the DE register pair being used in the "Play_Noise" loop are being trashed in the "RandomBit" routine. At the very least that should get me a few more crap points for Sub Chase in the CGC :p
  • edited March 2008
    BloodBaz wrote: »
    {closing ftp connection}

    OK and here it is!!!
    I have even thrown in a "border/sound effect" version also!

    I make this assembly version run in about 3.7 seconds! (37 seconds for a straight run of 10).

    BloodBaz, your code leaves one (1) unfilled pixel. Good work but incomplete :P
  • edited March 2008
    Arda wrote: »
    BloodBaz, your code leaves one (1) unfilled pixel. Good work but incomplete :P

    Damn! You are right! Top row about 15 pixels in. Not sure why 'cos I start and end with "1" and it is a maximal m-sequence. The only number left is 0 (zero). Maybe that is why.

    I'll get back to you.

    {update} Yes, bit 0 of the first byte is left untouched and this is due to the fact that the LFSR does not "visit" the number "0" (every 16-bit number BUT!) (unless seeded as "0" in the first place).
    If it had to be filled, it could be done by two instructions (~4 bytes) as follows:
    LD HL, 4000h
    LD (HL), 1
    
    Hope this acceptable.
  • edited March 2008
    BloodBaz wrote: »
    If it had to be filled, it could be done by two instructions (~4 bytes) as follows:
    LD HL, 4000h
    LD (HL), 1
    
    Hope this acceptable.

    No, it is not. You're FIRED!



    :grin:
  • edited March 2008
    Just to keep Arda and Arjun happy, here is the version with the final bit filled in:

    Source Code
    http://www.real-sense.com/chris/zx/PixelLFSR2.asm

    Comple Listing (Program = 85 bytes)
    http://www.real-sense.com/chris/zx/PixelLFSR2.lst

    Sample
    http://www.real-sense.com/chris/zx/PixelLFSR2.z80

    Sample with border/sound effect
    http://www.real-sense.com/chris/zx/PixelLFSRwithEffect2.z80

    The code I used is at the top of the listing and is as follows (so I don't "blank" any other "set" bits at 4000h)
    	; Fill equivilent of "0" (bit 0 of 4000h)
    	LD HL, 4000h
    	SET 0, (HL)
    
  • edited March 2008
    Thanks!

    Today I played another game that uses a similar routine, Batman the Movie. :)
  • edited April 2008
    You can cream tiny cycles off your code by doing this...
    	INC D
    	DEC D
    	JP NZ, LOOP1
    

    ...instead of...
    	LD A, D
    	OR A
    	JP NZ, LOOP1
    

    ...which in a large loop can be quite a saving! Place your 'bit table' on a page boundry for faster indexing. And don't use the index registers. Ever.
  • edited April 2008
    But the loop still runs at the same speed! ;)
  • edited April 2008
    frobush wrote: »
    You can cream tiny cycles off your code by doing this...
    	INC D
    	DEC D
    	JP NZ, LOOP1
    

    ...instead of...
    	LD A, D
    	OR A
    	JP NZ, LOOP1
    

    ...which in a large loop can be quite a saving! Place your 'bit table' on a page boundry for faster indexing. And don't use the index registers. Ever.
    Woody wrote: »
    But the loop still runs at the same speed! ;)

    I think the man (Woody) is right, the two versions take the same T-States.

    I am using the index register to access the bit table as you say. I thought about avoiding the IX register (as it is that bit slower) but as I only "set" the IX once and "read" from it only once per loop it is relatively minor. If I used BC, HL or DE instead (with, as you say B, H or D being a constant, aligned at a 256-byte boundary and C L or E being the offset) then I would have to both "set" B H and D registers and "read from" HL DE or BC every loop.

    I will do a T-State count to see if your suggestion can be optimised. The 256-byte boundary can be a bit of an arse when integrating into future developments so using the IX register (if indeed it is slower) may be a hit worth taking.

    I appreciate your feedback and throughts though.

    Cheers.
  • edited April 2008
    I did find one optimisation however:
    Changing:
    	LD A, D
    	AND C		; C = 192
    	CP  C		; C = 192
    	JR Z, LOOP1
    
    to:
    	LD A, D
    	CP  C		; C = 192
    	JR NC, LOOP1
    
    (The C register holds the constant 192)
    This will save 65536 * 4 (or 7?) T-States.
  • edited April 2008
    BloodBaz wrote: »
    I think the man (Woody) is right, the two versions take the same T-States.

    OMG! frobush is wrong? He did post on April Fool's day so maybe he was just having us on... :wink:
  • edited April 2008
    Woody wrote: »
    But the loop still runs at the same speed! ;)

    are there a document which list every opcode and it's clock cycle count? Can you provide me with a link?

    I found some z80 cycle tables (not related to speccy), but still, maybe there are different issues regarding to Zx Spectrum.
Sign In or Register to comment.