More colours II (Was: Most popular new features?)

1235715

Comments

  • edited November 2009
    Just one thing...

    For the real hardware, which I hope some day gets built (I'll build the damned thing if no one else does :-)) - why an odd numbered port?

    Some discussions have been going on in speccy.org. The issue: IORQ doesn't actually arrive at the ULA, it only arrives combined with the A0 signal (yes, I missed it too - but there's a transistor hiding on the schematic which prevents IORQ going low at the ULA unless A0 is low too). Therefore, for the ULA+ to be able to decode the port, you either have to use a slightly non-standard approach to decoding the port (instead of IORQL + port number + M1H, you need to read MREQH + port number + M1H) or solder a wire between the CPU and ULA+. (I expect the "slightly nonstandard" port decoding, recognising a IO cycle by seeing an address, WR low or RD low, and MREQ and M1 high will work fine. Note you need to pay attention to M1 on all IO cycles to avoid accidentally decoding an interrupt acknowledge, interrupt acknowledge uses a special M1 cycle which looks like an IO cycle, but with M1 also going active. If any issues occur, this is where they will occur - on machines with Z80s with faulty M1 lines, since the interrupt acknowledge will look like an I/O cycle to the ULA+ if decoding is done in this way).

    Presumably, the plan for real hardware would be to do a full 16 bit decode of the ports being used, rather than just something based on A14 and A15 (the only two address lines that arrive directly at the ULA).

    I'm just kicking out the idea here, I may have missed some of the conversation in irc that already dealt with this :-)
  • edited November 2009
    Winston wrote: »
    Just one thing...why an odd numbered port?

    It was the only one available -- everything else is taken. A side effect is that you need to force the ZX Printer to be fully decoded on port #xxFB or it will clash. There's a little hardware project for you. :)
    Presumably, the plan for real hardware would be to do a full 16 bit decode of the ports being used, rather than just something based on A14 and A15 (the only two address lines that arrive directly at the ULA).

    In the Harlequin full decoding. In the replacement ULA only partial, because otherwise you will need wires to the Z80. CSmith assures me that using an odd numbered port isn't a problem.
  • edited November 2009
    aowen wrote: »
    In the Harlequin full decoding. In the replacement ULA only partial, because otherwise you will need wires to the Z80. CSmith assures me that using an odd numbered port isn't a problem.

    I don't think you'd have to - all address lines do reach the ULA even if some of them only get there via a pair of multiplexors (it means that it'd need a little bit of logic during an I/O cycle to twiddle the multiplexors to get the values of A0-A13). The ULA after all controls these multiplexors in the first place (and I think it'll need a contention cycle if the ULA is busy reading lower memory, if the port to be used is already contended, it gets rid of this problem).

    My thoughts are to avoid partial decodes with new hardware at any cost, the I/O space is already pretty full, and only decoding A15/A14/A0 means an almost certain collision with something (decoding A0 = 1 means the LSB already collides with everything, if the MSB is 0x10xx xxxx it collides with the Spectranet - not that the Spectranet cares, but it means every time it changes memory pages the pallette will change!)
  • edited November 2009
    Winston wrote: »
    My thoughts are to avoid partial decodes with new hardware at any cost, the I/O space is already pretty full, and only decoding A15/A14/A0 means an almost certain collision with something (decoding A0 = 1 means the LSB already collides with everything, if the MSB is 0x10xx xxxx it collides with the Spectranet - not that the Spectranet cares, but it means every time it changes memory pages the pallette will change!)

    The I/O space is not only full, it's overloaded. There is nothing that doesn't clash with something. Using Philip Kendall's handy port comparison tool I was able to determine that port #xx3B only collides with the ZX Printer. If you fully decode the ZX Printer (on port #xxFB) then nothing else in that database uses the #xx3B space. When fully decoded, ULAplus uses only #FF3B and #BF3B. That still leaves a decent range of ports for additional hardware in the #xx3B range.
  • edited November 2009
    Winston wrote: »
    Just one thing...

    For the real hardware, which I hope some day gets built (I'll build the damned thing if no one else does :-)) - why an odd numbered port?

    Some discussions have been going on in speccy.org. The issue: IORQ doesn't actually arrive at the ULA, it only arrives combined with the A0 signal...

    Presumably, the plan for real hardware would be to do a full 16 bit decode of the ports being used, rather than just something based on A14 and A15 (the only two address lines that arrive directly at the ULA).

    Oh. Ta Winston!

    Andrew and I spoke on the phone for ages about this, and thought we'd got it sewn up. Somewhere along the lines, the A0 issue has been "lost in translation" between us - I of course know about the A0 implications. Bum.

    With the ULA replacement, you can decode A0-A7,A14,A15 which led to the port addresses specified in the 64 colour specification, as all these lines are available at the ULA. However, /IOREQ issue has been forgotten about :-(

    Changing the ports to be 0xBF3A and 0xFF3A won't be a problem other than causing a read and write to the usual ULA port if you're not using a ULAplus.

    So I suggest the following.
    Adding a subgroup of 3f to the mode group which, when read, returns the firmware issue, and subgroup of 3e returns a magic number.

    So out (0xBF3A), 7F [01 111110] selects the magic number register, and IN 0xFF3A returns the magic number.

    So out (0xBF3A), 7F [01 111111] selects the firmware register, and IN 0xFF3A returns the firmware version.
    Magic Number    b7-b2 = 111010 = 3A
                    b1-b0 = Top 2 bits of vendor
    
    Vendor/Version  b7-b6 = Bottom 2 bits of vendor
                    b5-b3 = Major version number
                    b2-b0 = Minor version number
    

    Vendor is thus 0-15, Major Version 0-7, minor version 0-7
    I declare that vendor 1010 will be me (ZXDesign), emulators can have their own vendor code. 0001 = Spin etc.
    I will administer these, particularly as I am the originator of the firmware. I will be no doubt involved in any other vendor releases of the firmware on the electronics side, so it seems sensible to me.

    Software should perform the magic number check once, at startup, to determine that a ULAplus is present (or emulated):
    OUT A,(BF3Ah)
    IN  A,(FF3Ah)
    SRL A
    SRL A
    CP  3Ah
    JR  Z,ULAPlusPresent
    

    If software tried to do write to the palette without a ULAPlus present, the border will flash and the speaker click etc (the A0 issue). I recommend reading the magic number instead of any other method, as software needs the protection of reading the firmware version (see next).

    I was thinking about this on my drive into work this morning. It would be incredibly important to be able to get the version number out, particularly as future enhancements may change the timing of things - software that dynamically changes the palette on the fly will need to know what register timing to expect if it differs between different firmware releases. It just so happens that Winston's post highlights another issue, which this will help solve.
  • edited November 2009
    Winston wrote: »
    IMy thoughts are to avoid partial decodes with new hardware at any cost, the I/O space is already pretty full, and only decoding A15/A14/A0 means an almost certain collision with something (decoding A0 = 1 means the LSB already collides with everything, if the MSB is 0x10xx xxxx it collides with the Spectranet - not that the Spectranet cares, but it means every time it changes memory pages the pallette will change!)

    The upper address bus decoding is a nightmare on the speccy, because you need to assert the RAS line. That will kick off a read cycle of the dram, and worse still, clash with any CPU access of the memory.

    The port is also contended anyway, because when the CPU is accessing the ram, the databus is is use ;-) So it's just another ULA port, with the same contention as the existing ULA port. Touching the RAS line to switch the multiplexer during IO when the CPU might be accessing the memory will be a really problem, as you'd basically end up contending the CPU for 8 cycles in 8 (while you hit the new registers).

    Now, here's the thing.

    I've been tinkering with the implementation of this in hardware, and entertaining the possibility of using some off-chip memory for the palette. Of course this is complicated, but it is significantly easier to handle contention etc (of which there will be lots) if access to the palette is only allowed during the border, or even better during a vertical sync.

    The reason for this is handling contention on the palette RAM. This RAM will basically be sending out streams of 8bit colour info (RRRGGBB) to the output circuitry. Interrupting this to update the palette will cause contention, so to avoid issues, border or vsync only update is a good thing.
    Further, because the palette will also be outputting the border colour, limiting update to the vsync is an even better thing. However, that will really mess up the idea of changing the palette on an arbitrary scan line, which I'd like to allow. Some round-the-houses buffering will be required, where the palette output is fed back into the CPLD/FLGA and then out again to the RGB colour circuitry.

    Do we really really really need per-scanline palette update?? I'm trying to keep this CHEAP AND SIMPLE.

    This is one of the reasons I want a firmware version register - some firmware in the future (say ULA++ at twice the price) may allow per-scanline palette update.
  • edited November 2009
    csmith wrote: »
    Vendor is thus 0-15, Major Version 0-15, minor version 0-15
    I declare that vendor 1010 will be me (ZXDesign), emulators can have their own vendor code. 0001 = Spin etc.

    As far as I'm concerned, emulators should emulate the hardware to the best of their ability. I don't see the need for a vendor code at all: either something is compatible with the original design, or it's not. Let's not burden application authors with having to go down a road of "vendor code 0x0001, implement these bug workarounds. Vendor code 0x0002, implement a different set." and so on. Certainly, Fuse will return the same value as the real hardware.
  • edited November 2009
    csmith wrote: »
    I've been tinkering with the implementation of this in hardware, and entertaining the possibility of using some off-chip memory for the palette. Of course this is complicated, but it is significantly easier to handle contention etc (of which there will be lots) if access to the palette is only allowed during the border, or even better during a vertical sync.

    I was thinking you may have to go down this route, purely because of reasons of CPLD resources :-) I was taking a look at how many registers the ULA+ was going to need and thinking "hmm, I think that's *all* the available registers on a '288 taken up"...
  • edited November 2009
    As far as I'm concerned, emulators should emulate the hardware to the best of their ability. I don't see the need for a vendor code at all: either something is compatible with the original design, or it's not. Let's not burden application authors with having to go down a road of "vendor code 0x0001, implement these bug workarounds. Vendor code 0x0002, implement a different set." and so on. Certainly, Fuse will return the same value as the real hardware.
    Fair enough Philip.
    I too thought that vendor might be too much of an overkill, however, firmware versions get us out of a hole when behaviour could change.
    Having a vendor code is kinda traditional though, which is why I slipped it in.
    Scrap vendor, go with magic number and major/minor release numbers.

    Or even magic number and eight bits defining the features present, because thats what I figure the firmware version would be used for anyway.

    Of course, open to suggestions, but I'd like to refer readers to my earlier post about changing/extended future behaviour with regard to timing etc.
  • edited November 2009
    Winston wrote: »
    I was thinking you may have to go down this route, purely because of reasons of CPLD resources :-) I was taking a look at how many registers the ULA+ was going to need and thinking "hmm, I think that's *all* the available registers on a '288 taken up"...
    Aye lad. We're talking about implementing memory in the CPLD. Hmm.

    Off chip is the way to go. Now, if I can only get people to agree that a vsync only update of the palette is okay (the port will contend until then), things will get cheaper and easier to handle.

    EDIT: Even FPGA is a nightmare. The Harlequin will be FPGA based, hopefully using no external memory. Getting the palette into there as well is going to get in the way of all the other video related timing. Some off-chip memory is going to happen somewhere I feel.
  • edited November 2009
    csmith wrote: »
    Do we really really really need per-scanline palette update??
    YES!

    The message you have entered is too short. Please lengthen your message to at least 5 characters.:roll:

    YES, you &!*#$@ well do! :grin:
    Joefish
    - IONIAN-GAMES.com -
  • edited November 2009
    As far as I'm concerned, emulators should emulate the hardware to the best of their ability. I don't see the need for a vendor code at all: either something is compatible with the original design, or it's not. Let's not burden application authors with having to go down a road of "vendor code 0x0001, implement these bug workarounds. Vendor code 0x0002, implement a different set." and so on. Certainly, Fuse will return the same value as the real hardware.
    I agree here. It'd be nice for the software to detect what it's running on, but I'd rather see the emulation match up to the real thing. Obviously that's shifting the work from game developers to emulator developers, but hey, gotta call it first! :lol:
    Joefish
    - IONIAN-GAMES.com -
  • edited November 2009
    If storage resources are short, I'd suggest that 64 colours is a lot - could you settle for 32, with the FLASH and BRIGHT bits just separate extensions of the INK and PAPER range. So, 16 INKs and 16 PAPERs? That's still as much as the Amiga and more than the ST. As a game developer, I could live with that. Though I'd probably soon live to regret saying so...

    As for per-line colours, I could get by with only changing the palette in the border area. I'd like to change it down the screen for doing sky-gradients in a single colour index like 16-bit games do (in conjunction with rainbow effects), so doing it in the middle of a line would be messy. In fact, if I could change it in the middle of a line and it only took effect on the next raster flyback, that would be very helpful indeed! It's bloody hard to time a border colour change amongst all the other stuff I want to do, so auto-synching it would be a big help.
    Joefish
    - IONIAN-GAMES.com -
  • edited November 2009
    csmith wrote: »
    Scrap vendor, go with magic number and major/minor release numbers.

    More than happy with that :-)
  • edited November 2009
    joefish wrote: »
    YES!
    Oh, oh, OH.
    I could have guessed YOU would want this!
    :grin:

    Are you absolutely sure that 64 colours aren't enough for you in a single frame. Serious question actually, as it is a lot of work to allow changing the colours in the palette between scan lines, and is it worth it? Keep in mind that I am a software developer as well, and I'd like to allow this, but it will cost, and I don't know how useful it will be.

    For example, how many OUT's can you realistically do in the 96T states of 'border and hsync' time? You'd need two for each palette entry, so theoretically you'd only be able to write to four entries, less in reality no doubt what with setting up data (96 / (11 * 2)) = 4.3.

    That's not a lot of change. A full 16 entry CLUT would take at least 4 scanlines, again probably more.

    Thoughts joefish?

    Not trying to get out of this, but I'd like expectations to be realistic and managed before we commit to a particular behaviour, especially as the hardware consequences are quite high.
  • edited November 2009
    csmith wrote: »
    Are you absolutely sure that 64 colours aren't enough for you in a single frame. Serious question actually, as it is a lot of work to allow changing the colours in the palette between scan lines, and is it worth it? Keep in mind that I am a software developer as well, and I'd like to allow this, but it will cost, and I don't know how useful it will be.

    If you look at the CPC, which has a switchable palette, there are an astounding number of effects that can be achieved by palette cycling part way through a display (for example the full screen scrolling text messages in Prehistorik 2), it's possibly one of the most powerful tools for creating great effects. It would be a shame not to be able to do it on the ULA+.

    I'm not as au fait with the new design (i've only skimmed over the technical details so far), but wouldn't restricting palette changes to a fixed point in time actually increase the hardware requirements? Since you'd then have to buffer changes in some additional registers/ram, then copy them into the output hardware at vsync in order to keep the on-screen colours available to the output hardware during a frame.
  • edited November 2009
    I can understand why it's technically difficult, so I'm not going to press the issue - much!

    In reality, I'd only ever try to change one, or two colour indices at most - one paper and possibly one border. Would it help if say, only FLASH 0 BRIGHT 0 PAPER 0 could be accessed on the fly? Or is it all-or-nothing? If you write the colour index command once, can you repeatedly re-write the data byte without issuing a new command byte?

    Being able to cue a colour change and have it affect the next whole line would be great for extending the screen image into the border - trying to time a border change with the flyback can be awkward, and reduces the width of any rainbow effect (as per my River Raid Tech Demo). The additional overhead of two OUTs to make a palette change instead of one for a border change seems like a lot, but if that palette change affects the border and the paper of a whole row, that'd be worth a bit of effort, surely? You could have colour bars right across the screen.

    64 colours really is enough to just throw 24 different papers at the problem of sky gradients and have different papers on each character row, using attributes as normal. It limits what you can do in the border (just 8 colours) but I wouldn't be too upset by that.

    The only other advantage of palette changes per-line is in simlifying compatibility with a standard Spectrum. Something like Buzzsaw could have an ordinary black background on a standard Speccy, but a colour gradient on an enhanced one. I wouldn't have to set individual per-line attributes - the effect would all be done by palette changes - so the game graphics code would be the same.

    But you're right to caution about the extra time taken, so maybe this method wouldn't be fast enough. That 'realistic expectations' thing again.

    I'll take whatever you can give. But any little extra could give an advantage. If just one colour can be adressed (maybe the command word is fixed but the data word can still be written, or there's a specific command for streamlining colour 000) then that would lend itself to some impressive effects. Just think, 16-bit moving full-width colour bars.
    Joefish
    - IONIAN-GAMES.com -
  • edited November 2009
    AndyC wrote: »
    If you look at the CPC, which has a switchable palette, there are an astounding number of effects that can be achieved by palette cycling part way through a display (for example the full screen scrolling text messages in Prehistorik 2), it's possibly one of the most powerful tools for creating great effects. It would be a shame not to be able to do it on the ULA+.
    Thing is, a lot of other machines could trigger interrupts part-way down the screen to insert these changes at the right moment. On the Speccy, you have to time these yourself, which hogs processor time unless you can carefully write routine tasks to take up exactly the required time.

    Of course, another Speccy mod might be to provide a selectable raster interrupt than cuts in every n lines of the screen - but it'd be a waste of time if you couldn't then change the palette.
    Joefish
    - IONIAN-GAMES.com -
  • edited November 2009
    The ports are not changing.

    [strike]Ok, here are my thoughts:

    The mode has been implemented in five emulators and several bits of new software and tools on port #xx3B. As with Sinclair's incomplete ROM, I think the cat is out of the bag on this one and I'd rather have a wire from the ULA to the Z80 on the real hardware than change the spec, although I agree that doing something sensible with #xx3A in order to make more ports available to developers is a *good thing*.

    32 colours are not enough to do the even the C64 hi-res screens. You need 48 for that because of the CLUT restrictions. I can think of a workaround, but again, I'm reluctant to change the specification at this point.

    The existing scheme is relatively trivial to implement in emulation, making it more likely to get widespread support. Changing the scheme now could put people off implementing it.[/strike]
  • edited November 2009
    aowen wrote: »
    32 colours are not enough to do the even the C64 hi-res screens. You need 48 for that because of the CLUT restrictions. I can think of a workaround, but again, I'm reluctant to change the specification at this point.

    Actually it is enough if it is implemented "properly". If you regard the FLASH bit as the 4th (highest) bit of INK and BRIGHT bit as the 4th bit of PAPER then you will have 32 colours, ofcourse you can combine only any colour in the INK with any colour in the PAPER.

    But as you said... I am not sure that changing this NOW is a wise idea, especially now that people started working with it and writing tools, doing palettes, etc.
  • edited November 2009
    Tom-Cat wrote: »
    But as you said... I am not sure that changing this NOW is a wise idea, especially now that people started working with it and writing tools, doing palettes, etc.

    I would also like to add that #xx3B was chosen specifically as being a harmless port to write to on any existing hardware. The worst thing that can happen when you write to the port without testing the hardware is that you print a bit of junk on someone's ZX Printer. The way I see it, the port is chosen. It's now up to the hardware guys to come up with a way of using it. If it's a case of soldering a single wire somewhere I think that's acceptable, if not ideal.
  • edited November 2009
    aowen wrote: »
    Ok, here are my thoughts:

    The mode has been implemented in five emulators and several bits of new software and tools on port #xx3B.

    [...]

    The existing scheme is relatively trivial to implement in emulation, making it more likely to get widespread support. Changing the scheme now could put people off implementing it.

    i completely agree.

    and pallete cycling is for lamers anyway ;)
  • edited November 2009
    Err - as far as I can tell, the emulated implementations allow instantaneous palette-switching, so cutting back on that because of hardware limitations is a change from the status quo. But the hardware is supposed to be leading this...

    P.S. If it is only 32 colours, I'd say BRIGHT should extend INK, and FLASH should extend PAPER - it splits the PAPER bits up but it's probably better for backwards compatibility, where the programmer would be more likely to have used BRIGHT on the INK colour, over black PAPER.
    Joefish
    - IONIAN-GAMES.com -
  • edited November 2009
    AndyC wrote: »
    I'm not as au fait with the new design (i've only skimmed over the technical details so far), but wouldn't restricting palette changes to a fixed point in time actually increase the hardware requirements? Since you'd then have to buffer changes in some additional registers/ram, then copy them into the output hardware at vsync in order to keep the on-screen colours available to the output hardware during a frame.

    Not as I understand it, which is that if you write to the palette registers when you're not in vsync, it contends the Z80 until the vsync happens.
  • edited November 2009
    My 2 penn'orth:

    We need to get this sorted out now.

    If per-scanline palette changes are going to be available, we need to get the port issue sorted - contention information (with high accuracy) will be needed, or stuff that people like JoeFish write in ZXSpin will not work on the real thing.

    There's very little software written at the moment that cannot be rewritten to handle any changes.

    I need to know which ports we're using, and how they're decoded. I need to know if we're using an even numbered port (which we should - this should be "plug and play", not "plug and solder a wire to a leg on your Z80") and we need to know how this will affect contention. ULA contention on even ports is complex enough already and accurate documentation will be vital. Even properly contended, palette changes are instant in spin, which it is almost guaranteed not to be on the real thing.

    For my part, I'd like to see palette changes on a per-scanline basis and 64 colour mode switching (back and forth between ULA+ and normal ULA mode) on a frame-by-frame basis.

    IMHO, we need to keep 64 colours. As much as it would make my job much easier if there were only 32, we're used to 64 now. That said, 32 would be much easier to handle when emulating an 8bpp screen.

    But please sort this out so we don't get a shedload of software that doesn't work because people are too excited to wait much longer before we start seeing new stuff emerging. Lots of developers are getting into this, and we can't stall them for long - they'll lose interest and this will go the way of the SlickStick.

    D.
  • edited November 2009
    aowen wrote: »
    The mode has been implemented in five emulators and several bits of new software and tools on port #xx3B. As with Sinclair's incomplete ROM, I think the cat is out of the bag on this one and I'd rather have a wire from the ULA to the Z80 on the real hardware than change the spec

    I'd rather see us end up with just one implementation of any "super ULA". Certainly for Fuse, and I'd strongly imagine for everything else, changing the port number is going to be trivial. I'd certainly be against having 2 super ULA implementations in Fuse just to deal with the fact that we changed the design.

    This thing's still being developed, and people who are writing code for it are (should) be aware of that and the chance that it will all change when we discover a problem/think of a better way of doing it.
  • edited November 2009
    I'd rather see us end up with just one implementation of any "super ULA". Certainly for Fuse, and I'd strongly imagine for everything else, changing the port number is going to be trivial. I'd certainly be against having 2 super ULA implementations in Fuse just to deal with the fact that we changed the design.

    This thing's still being developed, and people who are writing code for it are (should) be aware of that and the chance that it will all change when we discover a problem/think of a better way of doing it.

    +1

    phoenix
  • edited November 2009
    +1 on the "still being developed"

    On the solder-a-wire, the thing is I suspect it'll reduce demand for the real hardware by 90%. If it's just something you can plug in, even hardware novices would be willing to try it. How much demand for real hardware would there be? Well, judging by the DivIDE, for at least a few hundred I suspect.
  • edited November 2009
    The ports are not changing.
    This thing's still being developed, and people who are writing code for it are (should) be aware of that and the chance that it will all change when we discover a problem/think of a better way of doing it.

    [strike]If it's just a case of changing from #xx3B to #xx3A then that's fine. Existing software can easily be patched to deal with that. If additional complexity is going to be added then that's another thing entirely. I also want some assurances that we aren't going to encounter any other problems by switching to #xx3A, and that #xx3A is the most suitable even port.

    Also if we're going to change the ports, I want to do it once. I do not want people to have to continually rewrite their code once the software base starts growing.

    And I'd like to publish a time-frame, for software that's already in development, but I don't see how I can do that unless we have some real-world hardware testing and that's currently dependent on CSmith.[/strike]
  • edited November 2009
    joefish wrote: »
    Err - as far as I can tell, the emulated implementations allow instantaneous palette-switching, so cutting back on that because of hardware limitations is a change from the status quo.

    i'm not gonna to write any single instruction for not existing hw. well, not big lose at all but i already switched render model from "two videorams" to "chase beam" just because im not understanding situation about Harlequin well and ULA+ seems to be more probable on 48 model.
Sign In or Register to comment.