20-column multicolour: a half-baked idea
I was writing an email reply to one Mr Jowett about multicolour routines, and in that way that explaining something to someone else makes you think "oh, hang on, that's not right" and re-evaluate what you thought you knew about the subject, I realised that it might be possible to beat the 18-column limit for 8x1-attribute multicolour.
Existing routines write each line of attributes just before the ULA reaches them. While the ULA is busy drawing the last scanline of the character row above, we're busy writing the first line of attribute data for the next row down. While the ULA is rendering that first pixel line, we're just behind it writing data for the second line, and so on.
The key insight is this: while all of this is in full swing, we only need to change each character row SEVEN times, not eight. We will need to write to it an eighth time at some point, so that we're not still showing the last line's data at the top when the next frame comes around - but we can do that in our own time (during the lower and upper border, for example).
(I realise that the lower / upper border is the only opportunity you have to do actual game logic when working with engines like ZXodus and Bifrost, so this is clearly not very practical for real-world projects. But what the hell, we have records to break here?)
So, since there are 7 writes in the space of 8 scanlines, we can over-run the usual 224/228 tstates slightly, and begin each pass slightly later for each successive line. If we start the first line at the earliest possible opportunity (which means writing the leftmost byte as soon as the ULA has passed it, and generally working left-to-right - which unfortunately means fighting against the direction of PUSHes), and finish just before the ULA catches up with us at the end of the last line, can we possibly squeeze in another column or two?
I made a naive attempt at a 20-column routine last night - churning through the PUSHes as fast as possible, without checking where the contention delays fall - and failed on the fifth line, where the ULA caught up and redrew the magenta line (see columns 12-13) before I'd had chance to paint it green:

This probably isn't too surprising - we only managed to pull off 18 columns by carefully arranging our PUSHes to minimise contention, so we probably need to do the same here. Unfortunately, since each of our seven passes begins at a different point in the contention pattern, we'll need to solve that jigsaw puzzle seven times over. I think, then, that the next step is for me to build an interactive assembler-type tool (in Javascript, no doubt?) where I can enter my instruction sequence and immediately get a graphical representation of where the contention delays fall, so that I don't have to work that out by hand all the time. Unless someone else wants to get in on the action, that is!
Existing routines write each line of attributes just before the ULA reaches them. While the ULA is busy drawing the last scanline of the character row above, we're busy writing the first line of attribute data for the next row down. While the ULA is rendering that first pixel line, we're just behind it writing data for the second line, and so on.
The key insight is this: while all of this is in full swing, we only need to change each character row SEVEN times, not eight. We will need to write to it an eighth time at some point, so that we're not still showing the last line's data at the top when the next frame comes around - but we can do that in our own time (during the lower and upper border, for example).
(I realise that the lower / upper border is the only opportunity you have to do actual game logic when working with engines like ZXodus and Bifrost, so this is clearly not very practical for real-world projects. But what the hell, we have records to break here?)
So, since there are 7 writes in the space of 8 scanlines, we can over-run the usual 224/228 tstates slightly, and begin each pass slightly later for each successive line. If we start the first line at the earliest possible opportunity (which means writing the leftmost byte as soon as the ULA has passed it, and generally working left-to-right - which unfortunately means fighting against the direction of PUSHes), and finish just before the ULA catches up with us at the end of the last line, can we possibly squeeze in another column or two?
I made a naive attempt at a 20-column routine last night - churning through the PUSHes as fast as possible, without checking where the contention delays fall - and failed on the fifth line, where the ULA caught up and redrew the magenta line (see columns 12-13) before I'd had chance to paint it green:

This probably isn't too surprising - we only managed to pull off 18 columns by carefully arranging our PUSHes to minimise contention, so we probably need to do the same here. Unfortunately, since each of our seven passes begins at a different point in the contention pattern, we'll need to solve that jigsaw puzzle seven times over. I think, then, that the next step is for me to build an interactive assembler-type tool (in Javascript, no doubt?) where I can enter my instruction sequence and immediately get a graphical representation of where the contention delays fall, so that I don't have to work that out by hand all the time. Unless someone else wants to get in on the action, that is!
Post edited by gasman on
Comments
I'm also trying to get my head aronud the logic of what you're suggesting, and I can't quite work out if it makes sense. I suppose that you've got it to work for a few rows suggests the logic is sound, even if the timing isn't certain.
Personally I'm happy to drop to 8x2 multicolour - I can do that 24-wide with the luxury of a loop and a start pointer. And as you say, for a game I'd be unwilling to surrender too much top border time.
The other thought I had was maybe not to use all the registers on each line, and pre-load some of the alternate registers during the spare line for use later on. On each subsequent line you'd have one more register to use as it's pre-loaded data gets dumped, so it would need a custom sequence for each of the seven lines rather than one repeated function. But it looks like you were considering that anyway, to get the timing right.
One thing I'd do is mix up the colour order in your test data though, as it can be hard to spot if red/magenta and green/cyan edges are stable. Try 0,4,1,5,2,6,3,7.
- IONIAN-GAMES.com -
multicolour#1(border effects works on Unreal emulator,Pentagon 128 timing)
http://pouet.net/prod.php?which=61545
multicolour#2(Pentagon128 only!)
http://zxaaa.untergrund.net/view_demo.php?id=7599
- IONIAN-GAMES.com -
ZXodus doesn't leave you with any top border time for the game logic because it's sitting waiting to draw the tiles, which it does a couple per frame after it's updated the attributes.
That was more luck than judgement on my part as far as ZXodus is concerned. Now that Bifrost* is available I wouldn't suggest using ZXodus for any projects. The ZXodus II Engine is something else all together but I'm keeping that for my own project.
The filling is best set up behind the beam, so that you are trying to update things almost immediately after they've been shown. You start behind the beam, but the beam is eventually catching up.
So far I only experimented with code for uncontended machines, where all this is trivially simple to achieve. But with 25% increase in fillrate, I'd be surprised if a 22 character wide 8x1 multicolor was not possible even on contended 128k machine.
But I never considered to apply this mode of thinking to 48K. It is definitely an excellent idea.
The raster takes 4 states to draw each character. Since we need to start drawing right after it has passed the first 2 characters, and end the last line before it reaches the last two, that leaves us 16*4 = 64 states to overrun in total. Divided by 7, it means we have 9,14 extra states for each line. In practical terms, this means just 8 extra states, as contention would eat extra time going over 8... So, the question is... would just 8 extra states per line suffice to write another pair of bytes? I guess it depends on how much free time was left with the previous 18-character routines (which I'm not familiar with).
Good luck! :)
Not really. Implementing game logic during upper border is a real nightmare, because the game developer would need to ensure this code always takes an exact number of cycles, regardless of what happens. Even if we had lots of multicolor games in development right now, the vast majority would not do it.
In BIFROST*, the upper border is normally used just to refresh/animate tiles. Even so, there's still some spare time left. Since updating all attributes in the multicolor area could be done in less than 5K T-states, I'm sure it would be possible to do both.
In ZXodus II, the upper border time is more heavily used for more animations, but it could be changed to play animations slightly slower, and use the remaining time to update pixel lines.
Therefore your idea would be perfectly practical for most real-world projects!
There's an approach I had in mind since last year for a 128K multicolor renderer. Perhaps it could be adapted to 48K as follows:
Of course the description above is just a rough estimate. It will take quite some effort to adjust everything properly. But I bet this strategy works.
18x18 is about the sweet spot for the ZXodus II Engine though. It updates nine tiles per frame, plays an AY tune, and handles up to 66 animated tiles on screen at once without glitching on a +2A while still leaving enough cycles for game logic. The animations are quite CPU intensive so it uses an interrupt manager to do different things on different frames to keep it all flowing nicely. If I'd written in back in the 80s I'd have made a fortune. :(
Actually it's more.
You didn't take into account the free time before the raster scan passes the first 2 characters. This extra time can be used to load initial values into register pairs AF,BC,DE,HL,AF',BC',DE',HL',IX,IY and to set SP.
Normally you would need to update 8 times each multicolor row in 8*224T, including the time to initialize all registers 8 times.
Now you will need to update 7 times each multicolor row in 64+7*224T, including the time to initialize all registers 6 times only. The remaining 224-64=160T per row would be still available to execute another initialization of all registers before the next multicolor row starts.
Thanks!
I would be interested to implement something like BIFROST*2 using a larger 20x20 multicolor area or such, assuming this idea really works. Although putting together a new engine like this would take me a lot of time, so it certainly won't happen this year...
Based on everything I already saw, I can testify that project is awesome! :)
Yup, pretty sure the eventual solution is going to rely on lots of tactical register usage like that, particularly the use of alternate registers, since: A) loading up a bunch of registers is a nice way to fill time without incurring contention delays while the ULA is passing over the screen area; and B) if you do need to write to the screen while contention is going on, a long chain of PUSHes results in less wasted cycles than LD / PUSH / LD / PUSH...
Very clever! I was actually working with 128K timings in my initial test, as that's my 'default' platform of choice, but didn't think of using the shadow screen as well.
actually, I've already made something like this called asmp. it only supports couple of opcodes, but asmp tool has a built in assembler/simulator with full contention emulation. I'm sure you'll do it better than me, but it's suprising asmp also writes 7 of 8 raster lines, and uses 8th one for syncing with contention timing and waiting for about 200ts :D
I couldn't find anything useful to do in this time, as we cannot lose the sync with contention :)
by the way, I don't see any other way than structuring registers cleverly and reusing them without reloading, thus sparing some time.
spectrum only got 8 colours, and enough registers to arrange colours accordingly. I think more than 18 columns is a possibility with some help of pre-computation of registers and 8th raster.
There's more. Keep in mind that contention will affect your routine differently depending on the location of the multicolor area on screen. I prefer starting at column 1 as introduced by ZXodus, but you may need to choose another position to make your routine work.
There's also the choice of using IX or IY. Notice that PUSH IX is slower than PUSH DE, but LD (nn),IX takes the same time as LD (nn),DE. Actually I suspect it will be better to start your routine with LD (nn),IX to update the first 2 bytes on screen as soon as the raster scan allows it.
The good news is, you can assume your routine starts perfectly synchronized with the raster scan. Although an interrupt during HALT may have a variable delay from 0 to 3 T-states (or a lot more if the interrupt occurs when executing other instructions), the "anti-flickering mechanism" from BIFROST* can completely eliminate this difference. Anyway that's not something you need to worry right now.
The "classic" multicolor algorithm takes 224 T for each raster scan pass in a Spectrum 48K. For instance, the first pass on row 1 works as follows:
However we could save 8 T simply grouping the first 2 PUSHes:
Of course we can't make this change on current 18 columns multicolor implementations otherwise we would be updating attributes too fast. But it means there's potential for saving more time for the 20 columns idea...
It's not working yet, but it's VERY close. If I didn't make any mistakes, the timing will only fail in the last instruction, that was supposed to update the last pair of columns in the last raster scan!
All timings listed below are based on a Spectrum 48K with normal (non-late) timing. This routine assumes that attributes for the first raster scan were previously set during upper border, so it can start working at the second raster scan:
; --- PREPARE "PUSH AF/AF'" FOR LATER ; 15984T LD SP,nn ; reference AF/AF' values ; 15994T POP AF ; 16004T EX AF,AF' ; 16008T POP AF' ; 16018T ; --- SET ATTRIBUTES FOR 2ND RASTER SCAN --- LD SP,nn ; reference columns 9 and 10 ; 16028T LD HL,nn ; 16038T LD DE,nn ; 16048T LD BC,nn ; 16058T EXX ; 16062T LD HL,nn ; 16072T LD DE,nn ; 16082T LD BC,nn ; 16092T LD IX,nn ; 16106T LD IY,nn ; 16120T LD (nn),IX ; columns 1 and 2 ; 16144T LD (nn),IY ; columns 3 and 4 ; 16168T PUSH BC ; columns 9 and 10 ; 16184T PUSH DE ; columns 7 and 8 ; 16200T PUSH HL ; columns 5 and 6 ; 16216T LD SP,nn ; reference columns 19 and 20 ; 16226T LD HL,nn ; 16236T LD DE,nn ; 16246T LD BC,nn ; 16256T PUSH BC ; columns 19 and 20 ; 16267T PUSH DE ; columns 17 and 18 ; 16278T EXX ; 16282T PUSH BC ; columns 15 and 16 ; 16293T PUSH DE ; columns 13 and 14 ; 16304T PUSH HL ; columns 11 and 12 ; 16315T ; --- SET ATTRIBUTES FOR 3RD RASTER SCAN --- LD SP,nn ; reference columns 5 and 6 ; 16325T LD HL,nn ; 16335T LD DE,nn ; 16345T LD BC,nn ; 16355T EXX ; 16359T LD DE,nn ; 16369T LD BC,nn ; 16379T LD IX,nn ; 16393T PUSH BC ; columns 5 and 6 ; 16408T PUSH DE ; columns 3 and 4 ; 16424T PUSH HL ; columns 1 and 2 ; 16440T LD SP,nn ; reference columns 19 and 20 ; 16450T LD HL,nn ; 16460T LD DE,nn ; 16470T LD BC,nn ; 16480T PUSH BC ; columns 19 and 20 ; 16491T PUSH DE ; columns 17 and 18 ; 16502T PUSH HL ; columns 15 and 16 ; 16513T EXX ; 16517T PUSH BC ; columns 13 and 14 ; 16528T PUSH DE ; columns 11 and 12 ; 16539T PUSH HL ; columns 9 and 10 ; 16550T LD HL,nn ; 16560T PUSH HL ; columns 7 and 8 ; 16571T ; --- SET ATTRIBUTES FOR 4TH RASTER SCAN --- LD HL,nn ; 16581T LD DE,nn ; 16591T LD BC,nn ; 16601T PUSH BC ; columns 5 and 6 ; 16616T PUSH DE ; columns 3 and 4 ; 16632T PUSH HL ; columns 1 and 2 ; 16648T LD SP,nn ; reference columns 19 and 20 ; 16658T LD HL,nn ; 16668T LD DE,nn ; 16678T LD BC,nn ; 16688T PUSH BC ; columns 19 and 20 ; 16704T PUSH DE ; columns 17 and 18 ; 16715T PUSH HL ; columns 15 and 16 ; 16726T LD HL,nn ; 16736T PUSH HL ; columns 13 and 14 ; 16747T LD HL,nn ; 16757T PUSH HL ; columns 11 and 12 ; 16768T LD HL,nn ; 16778T PUSH HL ; columns 9 and 10 ; 16789T PUSH AF ; colums 7 and 8 ; 16800T EX AF,AF' ; 16804T ; --- SET ATTRIBUTES FOR 5TH RASTER SCAN --- LD HL,nn ; 16814T LD DE,nn ; 16824T PUSH DE ; columns 5 and 6 ; 16840T PUSH HL ; columns 3 and 4 ; 16856T LD HL,nn ; 16866T LD DE,nn ; 16876T LD BC,nn ; 16886T EXX ; 16890T LD HL,nn ; 16900T LD DE,nn ; 16910T LD BC,nn ; 16920T PUSH BC ; columns 1 and 2 ; 16931T LD SP,nn ; reference columns 19 and 20 ; 16941T PUSH DE ; columns 19 and 20 ; 16952T PUSH HL ; columns 17 and 18 ; 16963T EXX ; 16967T PUSH BC ; columns 15 and 16 ; 16978T PUSH DE ; columns 13 and 14 ; 16989T PUSH HL ; columns 11 and 12 ; 17000T LD HL,nn ; 17010T PUSH HL ; columns 9 and 10 ; 17021T LD HL,nn ; 17031T PUSH HL ; columns 7 and 8 ; 17048T ; --- SET ATTRIBUTES FOR 6TH RASTER SCAN --- LD HL,nn ; 17058T LD DE,nn ; 17068T LD BC,nn ; 17078T EXX ; 17082T LD HL,nn ; 17092T LD DE,nn ; 17102T LD BC,nn ; 17112T PUSH BC ; columns 5 and 6 ; 17128T PUSH DE ; columns 3 and 4 ; 17144T PUSH HL ; columns 1 and 2 ; 17155T LD SP,nn ; reference columns 17 and 18 ; 17165T LD HL,nn ; 17175T PUSH HL ; columns 17 and 18 ; 17186T EXX ; 17190T PUSH BC ; columns 15 and 16 ; 17201T PUSH DE ; columns 13 and 14 ; 17212T PUSH HL ; columns 11 and 12 ; 17223T PUSH IX ; columns 9 and 10 ; 17238T PUSH AF ; columns 7 and 8 ; 17249T LD HL,nn ; 17259T LD (nn),HL ; columns 19 and 20 ; 17280T ; --- SET ATTRIBUTES FOR 7TH RASTER SCAN --- LD HL,nn ; 17290T LD DE,nn ; 17300T LD BC,nn ; 17310T EXX ; 17314T LD HL,nn ; 17324T LD DE,nn ; 17334T LD BC,nn ; 17344T PUSH BC ; columns 5 and 6 ; 17360T LD BC,nn ; 17370T PUSH BC ; columns 3 and 4 ; 17381T PUSH DE ; columns 1 and 2 ; 17392T LD SP,nn ; reference columns 17 and 18 ; 17402T PUSH HL ; columns 17 and 18 ; 17413T EXX ; 17417T PUSH BC ; columns 15 and 16 ; 17428T PUSH DE ; columns 13 and 14 ; 17439T PUSH HL ; columns 11 and 12 ; 17450T LD HL,nn ; 17460T PUSH HL ; columns 9 and 10 ; 17471T LD HL,nn ; 17481T PUSH HL ; columns 7 and 8 ; 17496T LD HL,nn ; 17506T LD (nn), HL ; columns 19 and 20 ; 17528T ; --- SET ATTRIBUTES FOR 8TH RASTER SCAN --- LD HL,nn ; 17538T LD DE,nn ; 17548T LD BC,nn ; 17558T EXX ; 17562T LD HL',nn ; 17572T LD DE',nn ; 17582T LD BC',nn ; 17592T PUSH BC ; columns 5 and 6 ; 17603T PUSH DE ; columns 3 and 4 ; 17614T PUSH HL ; columns 1 and 2 ; 17625T LD SP,nn ; reference columns 15 and 16 ; 17635T EXX ; 17639T PUSH BC ; columns 15 and 16 ; 17650T PUSH DE ; columns 13 and 14 ; 17661T PUSH HL ; columns 11 and 12 ; 17672T LD HL,nn ; 17682T PUSH HL ; columns 9 and 10 ; 17693T LD HL,nn ; 17703T PUSH HL ; columns 7 and 8 ; 17720T LD HL,nn ; 17730T LD (nn), HL ; columns 17 and 18 ; 17752T LD HL,nn ; 17762T LD (nn), HL ; OPS!!! TOO LATE FOR COLUMNS 19 AND 20Now back to the drawing board...
Really those should be referred to as cool and warm timings. ULA-based Spectrums all start off with cool timings and will eventually drift to warm timings if left on long enough. Another reason why I'm not bothering to support them with the ZXodus II Engine.
Well, I believe you have seen a preview version of it already. If you've got a project you want to use it for let me know. I'm willing to license it on a case by case basis. It will remain proprietary closed source though.
; --- PREPARE "PUSH AF/AF'" FOR LATER ; 15984T LD SP,nn ; reference AF/AF' values ; 15994T POP AF ; 16004T EX AF,AF' ; 16008T POP AF ; 16018T ; --- SET ATTRIBUTES FOR 2ND RASTER SCAN --- LD SP,nn ; reference columns 5 and 6 ; 16028T LD HL,nn ; 16038T LD DE,nn ; 16048T LD BC,nn ; 16058T EXX ; 16062T LD HL,nn ; 16072T LD DE,nn ; 16082T LD BC,nn ; 16092T LD IX,nn ; 16106T LD IY,nn ; 16120T LD (nn),IX ; columns 1 and 2 ; 16144T PUSH IY ; columns 5 and 6 ; 16168T PUSH BC ; columns 3 and 4 ; 16184T LD SP,nn ; reference columns 19 and 20 ; 16194T LD IX,nn ; 16208T PUSH DE ; columns 19 and 20 ; 16224T LD DE,nn ; 16234T LD BC,nn ; 16244T PUSH BC ; columns 17 and 18 ; 16259T PUSH DE ; columns 15 and 16 ; 16270T PUSH HL ; columns 13 and 14 ; 16281T LD HL,nn ; 16291T LD DE,nn ; 16301T LD BC,nn ; 16311T PUSH BC ; columns 11 and 12 ; 16322T PUSH DE ; columns 9 and 10 ; 16333T PUSH HL ; columns 7 and 8 ; 16344T ; --- SET ATTRIBUTES FOR 3RD RASTER SCAN --- LD HL,nn ; 16354T LD DE,nn ; 16364T LD BC,nn ; 16374T PUSH BC ; columns 5 and 6 ; 16392T PUSH DE ; columns 3 and 4 ; 16408T PUSH HL ; columns 1 and 2 ; 16424T LD IY,nn ; 16438T LD SP,nn ; reference columns 19 and 20 ; 16448T LD HL,nn ; 16458T LD DE,nn ; 16468T LD BC,nn ; 16478T PUSH BC ; columns 19 and 20 ; 16489T PUSH DE ; columns 17 and 18 ; 16500T PUSH HL ; columns 15 and 16 ; 16511T EXX ; 16515T PUSH BC ; columns 13 and 14 ; 16526T PUSH DE ; columns 11 and 12 ; 16537T PUSH HL ; columns 9 and 10 ; 16548T LD HL,nn ; 16558T PUSH HL ; columns 7 and 8 ; 16569T ; --- SET ATTRIBUTES FOR 4TH RASTER SCAN --- LD HL,nn ; 16579T LD DE,nn ; 16589T LD BC,nn ; 16599T EXX ; 16603T LD HL,nn ; 16613T LD DE,nn ; 16623T LD BC,nn ; 16633T PUSH BC ; columns 5 and 6 ; 16648T PUSH DE ; columns 3 and 4 ; 16664T PUSH HL ; columns 1 and 2 ; 16680T LD SP,nn ; reference columns 19 and 20 ; 16690T LD HL,nn ; 16700T LD DE,nn ; 16710T LD BC,nn ; 16720T PUSH BC ; columns 19 and 20 ; 16731T PUSH DE ; columns 17 and 18 ; 16742T PUSH HL ; columns 15 and 16 ; 16753T EXX ; 16757T PUSH BC ; columns 13 and 14 ; 16768T PUSH DE ; columns 11 and 12 ; 16779T PUSH HL ; columns 9 and 10 ; 16790T PUSH AF ; columns 7 and 8 ; 16801T ; --- SET ATTRIBUTES FOR 5TH RASTER SCAN --- LD HL,nn ; 16811T LD DE,nn ; 16821T LD BC,nn ; 16831T EXX ; 16835T LD HL,nn ; 16845T LD DE,nn ; 16855T LD BC,nn ; 16865T PUSH BC ; columns 5 and 6 ; 16880T PUSH DE ; columns 3 and 4 ; 16896T LD DE,nn ; 16906T LD BC,nn ; 16916T PUSH IY ; columns 1 and 2 ; 16931T LD SP,nn ; reference columns 19 and 20 ; 16941T PUSH BC ; columns 19 and 20 ; 16952T PUSH DE ; columns 17 and 18 ; 16963T PUSH HL ; columns 15 and 16 ; 16974T EXX ; 16978T PUSH BC ; columns 13 and 14 ; 16989T PUSH DE ; columns 11 and 12 ; 17000T PUSH HL ; columns 9 and 10 ; 17011T LD HL,nn ; 17021T EX AF,AF' ; 17025T PUSH HL ; columns 7 and 8 ; 17040T ; --- SET ATTRIBUTES FOR 6TH RASTER SCAN --- LD HL,nn ; 17050T LD DE,nn ; 17060T LD BC,nn ; 17070T EXX ; 17074T LD HL,nn ; 17084T LD DE,nn ; 17094T LD BC,nn ; 17104T PUSH BC ; columns 5 and 6 ; 17120T PUSH DE ; columns 3 and 4 ; 17136T LD DE,nn ; 17146T LD BC,nn ; 17156T PUSH BC ; columns 1 and 2 ; 17167T LD SP,nn ; reference columns 19 and 20 ; 17177T PUSH DE ; columns 19 and 20 ; 17188T PUSH HL ; columns 17 and 18 ; 17199T EXX ; 17203T PUSH BC ; columns 15 and 16 ; 17214T PUSH DE ; columns 13 and 14 ; 17225T PUSH HL ; columns 11 and 12 ; 17236T PUSH AF ; columns 9 and 10 ; 17247T LD HL,nn ; 17257T PUSH HL ; columns 7 and 8 ; 17272T ; --- SET ATTRIBUTES FOR 7TH RASTER SCAN --- LD HL,nn ; 17282T LD DE,nn ; 17292T LD BC,nn ; 17302T EXX ; 17306T LD HL,nn ; 17316T LD DE,nn ; 17326T LD BC,nn ; 17336T PUSH BC ; columns 5 and 6 ; 17352T LD BC,nn ; 17362T PUSH IX ; columns 3 and 4 ; 17379T PUSH BC ; columns 1 and 2 ; 17390T LD SP,nn ; reference columns 17 and 18 ; 17400T PUSH DE ; columns 17 and 18 ; 17411T PUSH HL ; columns 15 and 16 ; 17422T EXX ; 17426T PUSH BC ; columns 13 and 14 ; 17437T PUSH DE ; columns 11 and 12 ; 17448T PUSH HL ; columns 9 and 10 ; 17459T LD HL,nn ; 17469T LD DE,nn ; 17479T PUSH DE ; columns 7 and 8 ; 17496T LD DE,nn ; 17506T LD BC,nn ; 17516T LD (nn),HL ; columns 19 and 20 ; 17536T ; --- SET ATTRIBUTES FOR 8TH RASTER SCAN --- LD HL,nn ; 17546T EXX ; 17550T LD HL,nn ; 17560T PUSH HL ; columns 5 and 6 ; 17576T LD HL,nn ; 17586T LD DE,nn ; 17596T LD BC,nn ; 17606T PUSH BC ; columns 3 and 4 ; 17617T PUSH DE ; columns 1 and 2 ; 17628T LD SP,nn ; reference columns 15 and 16 ; 17638T PUSH HL ; columns 15 and 16 ; 17649T EXX ; 17653T PUSH BC ; columns 13 and 14 ; 17664T PUSH DE ; columns 11 and 12 ; 17675T PUSH HL ; columns 9 and 10 ; 17686T LD HL,nn ; 17696T PUSH HL ; columns 7 and 8 ; 17712T LD HL,nn ; 17722T LD (nn),HL ; columns 17 and 18 ; 17744T LD HL,nn ; 17754T LD (nn),HL ; columns 19 and 20 ; 17776TOnce again, all timings are based on a Spectrum 48K with normal (non-late) timing. Total execution time is 17776T - 15984T = 1792T = 8 * 224T, so each execution cycle should finish exactly on time to start the following cycle.
I'm guessing exactly the same code will work just fine on a Spectrum 128K/+2, but I have no idea about the Spectrum +2A/+3. As a matter of fact, I didn't even emulate this code in a Spectrum 48K yet, so it's perfectly possible I made a silly mistake somewhere :)
You've got more free t-states per frame on the +2A/+3 than on the other machines so I'd actually start by trying to get it to work on those and then see if it will run on the 48. You may run into problems on the ULA 128s owing to the I/O contention.
The number of T-states per frame is not important. The real problem is the limited number of T-states per raster scan line (224T for a Spectrum 48K, 228T for others).
For this reason, it would never work to plan everything for 228T and hope it would also work for 224T. It has to be the opposite. At first glace, it seems the extra delays due to contention will make the same routine also work on a Spectrum 128K/+2, but I will need some time to test everything properly...
True. The I/O contention would only be an issue if you were using the shadow VRAM.
ULA contention will iron out kinks between 224 and 228 Ts. You just need to start drawing the attributes after a different number of Ts.
Not necessarily.
Delays due to memory contention in a Spectrum 48K follows a pattern line this: 6,5,4,3,2,1,0,0,6,5,4,3,2,1,0,0,6,5,...
Delays due to memory contention in a Spectrum +2A/+3 follows a pattern line this: 7,6,5,4,3,2,1,0,7,6,5,4,3,2,1,0,7,6,...
The "standard" 18 columns multicolor routine executes only 2 PUSHes per raster scan line. Each of these instructions typically take 1T or 2T longer to execute in a Spectrum +2A/+3, but since there are extra 4T per raster scan line (228T instead of 224T), this is not a problem.
Unfortunately it's impossible to implement a 20 columns multicolor routine without executing a lot more accesses during contention, so I'm afraid a Spectrum +2A/+3 may require a completely different routine...
I'll be interested to see if this in fact proves to be the case.
Leave at least one row of ordinary attributes at the top of the screen. ZXODUS does this. I don't do it in Buzzsaw+ but my other multicolour experiments do. You can always find a way to use more upper border time.
Then, whilst you're in that contended time of the first character row, do something like copy one or two rows of attributes (or even do a whole character row's worth of copying) on the first row of multicoloured characters before the raster gets there. Then, do them again properly as the raster actually arrives. That first pass will iron out all the kinks in timing coming from a delayed interrupt, then your first line of multicolour will work just as well as all the repeated lines.
- IONIAN-GAMES.com -
Me too :)
Thanks for the suggestion, but there's a better way. Executing a single instruction ld hl,($4000) at the end of the contention period on each raster scan line will provide a similar result.
This is the main idea behind the "anti-flickering mechanism" implemented in BIFROST*. This saves a lot of bytes, and you could even use the remaining time on each of these raster scan lines to do something useful if you need.
In BIFROST* source code, search for comment "synchronize with the raster beam" to find the relevant piece of code.
Tested on both Spectrum 48K and Spectrum 128K, obtaining the following result:
Unfortunately it's not working (yet) on a Spectrum +3:
In this case, the last 2 columns during the last raster scan line are not updated fast enough due to (different) contention. Because of this, every 8th pixel line shows the same attribute from the pixel line above (PAPER black instead of blue).