Goldfinch - an open software stack for mass storage

edited June 2009 in Announcements
Yep, another pet project of mine to compete with all the others I've started. But hey, if I didn't get distracted by things like this I'd just get distracted by Youtube and sudoku instead...

http://github.com/gasman/goldfinch/tree/master

Goldfinch [size=-2](not a particularly meaningful name, just that there was one in my garden on the day I happened to name the project)[/size] is an attempt at remedying the "walled garden" syndrome in the world of Spectrum mass storage - there are plenty of software projects doing exciting things with mass storage and filesystems on the Speccy, spread across multiple competing hardware interfaces, but for one reason or another they end up having 'baggage' that prevents the casual tinkerer from properly harnessing that existing work for their own stuff - so writing a program that reads 'some file' off 'some disk' is a bigger deal than it ought to be. The reasons for this might be technical (the disk access code is too tightly coupled to a Basic extension, or an emulation layer, or something else, and only one person in the world understands the whole package), legal (licencing problems prevent the source code from being released), or the entire project being locked away in perpetual vapourware hell (ahem).

So. The initial goal of the project is to create a set of libraries for C and Z80, in z88dk library format (because z88dk is the one project on the scene that solves the issue of code re-use), for accessing mass storage systems in a natural, device-independent way, at whatever level of abstraction is appropriate. To give an example of what I'm getting at, I'll quote from the test program I'm using (and bear in mind this only demonstrates the currently implemented functionality, which is in a ridiculously early stage right now):
#include <stdio.h>
#include "include/divide.h"
#include "include/block_device.h"
 
int main() {
  unsigned char buffer[512];
  unsigned int x,y;
  BLOCK_DEVICE *device;
 
  /* Open the block device for the DivIDE master drive */
  device = divide_open_drive(DIVIDE_DRIVE_MASTER);
  /* Read block number 0 from it */
  read_block(device, (void *)buffer, 0L);
 
  for (y = 0; y < 256; y += 16) {
    for (x = 0; x < 16; x++) {
      printf("%02x ", buffer[y|x]);
    }
    printf("\n");
  }
 
  return 0;
}

Here we're opening the attached DivIDE disk as a block_device, and reading and dumping the first sector from it. But beyond the first line, the same bit of code could equally work on a ZXCF-attached disk, or a .TRD image on a FAT16 filesystem, or an .MGT image that's just been streamed over Spectranet... as all of those things will implement the block_device API. And likewise, there'll be a standard API for accessing files, so that C programmers can do the familiar fopen() stuff without worrying about the underlying filesystem.

As for what this means for the end user - hopefully, by supporting all of these technologies (DivIDE, ZXCF, TRD, FAT...) through common standard mechanisms, we'll get away from the situation where Fatware is "the DivIDE firmware that reads TAP files off FAT with a pretty menu" and ResiDOS is "the DivIDE+ / ZXCF / (your interface here) firmware with Basic extensions to read and write to FAT", and into a bright new future where all of those capabilities will operate together, rather than having every lone genius reinventing their own bits of the wheel.

(Pause for dramatic effect, accompanied by stirring violin music)

...And with a bit of luck, we'll generally lower the barrier towards doing Interesting Things with mass storage on the Speccy, and end up with all sorts of applications that not even I've thought of yet. (I have my own ideas about how to take this further with dynamically loaded modules and things which kind of segue into a whole new operating system for the Spectrum, but let's not get ahead of ourselves here.)

Anyway. This is still in a proof-of-concept-ish stage, but I wanted to get it out in the open early so that it wouldn't be dominated by my own craziness and turn into another walled garden. I've set it up on Github, which all the cool kids seem to be using these days to collaborate on code, so feel free to stick your nose in regularly as it turns into something that might be useful.
Post edited by gasman on

Comments

  • edited May 2009
    How about implementing the ability to use a PC for mass storage attached to the tape in?

    e.g. a BASIC app runs on the spectrum that saves a file name to "tape" (an audio connection to a PC running a custom server in the broom cupboard). The saved file contains REM statements to identify a given file on a given path on the PC (that will have been provided via an earlier "list" save REM request to the server). The BASIC program then attempts to load (via load "") whatever data is next sent back from the PC.

    I can see potential that no special hardware is needed and all users need ever do is run a small BASIC app which could be loaded from disk, preloaded from the server (e.g. just hit enter on the 128s) or other spectrum interface.

    IMHO this would ideal for users whose first boot device may be a Plus D or +3 disk - future versions could implement some sort of speed loader too?

    I'm not much of a Speccy developer but a menu system for the Spectrum would be easy to write in BASIC, and a simple script on the PC side to retrieve tap files and play them back (perhaps using tape2wav or similar) should be easy to hook up. The difficult bit would be decoding the incoming sound on a PC to find the tape file that needs to be provided, but as most emulators can do this it probably wouldn't be too difficult....

    It might suck a bit for multipart tape files however :(
  • edited May 2009
    Mmm, so what you're talking about is a two-way data transfer protocol over the MIC and EAR sockets, built on top of the Spectrum tape loading scheme to a lesser or greater extent? Spectrum and PC chirruping back and forth in conversation? I like it. In fact, it's such a neat idea that I'm half convinced there must be a fatal flaw in it otherwise someone would have thought of it and implemented it already... :-)

    For the particular example of selecting and loading TAP files, I can't see it offering much of a benefit over the more straightforward solution of picking your files on the PC side (for people who don't have their PCs in a cupboard with an audio cable hanging out of it, that is...), but for applications that involve random access it would open up new possibilities. You could emulate a disk system through it, loading and saving sectors on demand. Or even browse the web over it...

    'Course, this Goldfinch project is more about standardising existing mass storage solutions than coming up with new ones. But if someone can make it work, I'll certainly consider supporting it!
  • edited June 2009
    Dammit, I wish I'd seen this thread when it was new...
    gasman wrote: »
    Mmm, so what you're talking about is a two-way data transfer protocol over the MIC and EAR sockets, built on top of the Spectrum tape loading scheme to a lesser or greater extent? Spectrum and PC chirruping back and forth in conversation? I like it. In fact, it's such a neat idea that I'm half convinced there must be a fatal flaw in it otherwise someone would have thought of it and implemented it already... :-)

    Actually, I did think of it :-) About 3 or 4 years ago I started working on something like this to talk using Manchester encoding to the PC.

    But then the idea of just using proper ethernet hardware came along and it seemed like a much better idea, since the Spectrum would be self-sufficient, and not reliant on a local PC. It'd be the equal of a PC, not the subordinate, second class dependent!

    Now onto your project.

    This looks like a fairly low-level thing: you may have seen my mutterings about the Spectranet VFS layer. But this provides a higher level abstraction, essentially something extremely similar to the POSIX fcntl. There's a good reason for this... the Z88DK uses a POSIX-like fcntl for its lower level I/O operations. And, well, so does everyone else. If I'm not wrong, your project looks to be the thing that would sit below the fcntl. Which is good because that means we're not just duplicating effort :-) Once I've got the latest bits and pieces put together, the next step is to integrate it with the z88dk fcntl - which I think abstracts things quite well - at a file I/O level I think it's already possible to write one set of source code that can be built for most things.

    But onto the walled-garden thing: I thought the z88dk already provided a way of accessing "some file" from "some mass storage" in an abstracted way via the fcntl of which I spoke - or am I just missing something here? (I recall you talking earlier of needing a high performance method of getting data off discs, for example, to show video). I also recall something about the z88dk guys completely overhauling the z88dk fcntl such that it would work for any I/O, from sockets, RS232 ports, files, the lot.

    Anyway, I've had a couple of beers now so I perhaps should not ramble any further :-)
  • edited June 2009
    Winston wrote: »
    But onto the walled-garden thing: I thought the z88dk already provided a way of accessing "some file" from "some mass storage" in an abstracted way via the fcntl of which I spoke - or am I just missing something here? (I recall you talking earlier of needing a high performance method of getting data off discs, for example, to show video). I also recall something about the z88dk guys completely overhauling the z88dk fcntl such that it would work for any I/O, from sockets, RS232 ports, files, the lot.

    Gasman posted a notice about this project at the z88dk boards and yes I do see a lot of overlap with what we are trying to do, what gasman is trying to do, and what you're doing with VFS.

    The main problem with the existing z88dk stdio implementation is that it can only support one mass storage device in a program (in addition to tape and sockets, as they are separate). This is because the stdio device calls (fopen, etc) are bound to a single device driver, so to speak. at compile time. The other problems with the existing stdio is that it is just too large for my taste (owing to it being implemented mostly in C) and it is not a 'full' implementation, with it missing several features that I think are required for the brave new future of internet and random access storage.

    The new stdio implementation, which I hope will be incorporated in a test target in the next release July 15, aims to solve all these problems as well as solve the device abstraction issue that gasman's project is concerned with.

    Here's how it works (and this may change as the implementation is completed):

    The startup code (the bit of code that runs prior to main) contains an optional user-specified table of letter and device driver address associations. It also contains an optional default device driver address. Also in the startup are the three pre-opened streams stdin, stdout, stderr which can be associated with any device driver as well.

    In the user program, code can open as file on a specific device by specifying its letter name in the first two characters of the fopen string:
    FILE *in;
    in = fopen("d>/games/rtype.sna","r");   // d is a disk device
    in = fopen("s>9600;8/N/1","rw");   // s is a serial device
    

    The fopen code locates the driver in the aforementioned table that is associated with the letter name and passes a pointer to the remaining filename (with the first two chars pruned) which the driver can interpret as it wishes. In the first case, we have a path to a filename for a disk device and in the second case the serial driver is picking off baud rate, parity, etc from the filename string. In the latter case it may actually be preferrable to use ioctl to set communication settings, but the is just an example.

    gasman's block read and display code might look something like this:
    #include <stdio.h>
    
    unsigned char buffer[512];
    
    main()
    {
       FILE *in;
       unsigned int x,y;
    
       /* Open the block device for the DivIDE master drive */
    
       in = fopen("d>1:hexdump","r");
    
       /* Read block number 0 from it */
    
       fread(buffer, 1, 512, in);
    
       for (y = 0; y < 256; y += 16)
       {
          for (x = 0; x < 16; x++)
             printf("%02x ", buffer[y|x]);
          printf("\n");
       }
     
      return 0;
    }
    


    or even this:
    #include <stdio.h>
    
    main()
    {
       FILE *in;
       int c;
    
       in = fopen("d>1:hexdump","r");
    
       while (fscanf(in, "%c", &c))
          printf("%02x ", c);
    
       printf("\n\n");
    
       fclose(in);
    }
    

    Here I assume the divide driver parses the string "1:hexdump" as the file "hexdump" on drive 1. And, of course, you could operate at the file descriptor level as well.

    The nice thing about this is you could replace the driver associated with the "d" character from divIDE to +3 and have the same program run on a +3 system rather than divIDE. This modification would require a quick recompile to pull in the right driver code but we can take things a step further. Suppose we constructed an 8k rom containing a z88dk stdio core customized to a specific set-up. Device 'd' could be associated with the hw's main disk device in rom. Then any program written to use the stdio core from this rom could be run and access the correct disk device on any hardware setup no matter what disk system was in actual use. This would be binary compatibility across any hw platform.

    The new stdio is message based. The high level i/o functions (FILE* stuff) and low level functions (file descriptor stuff) all generate a finite set of messages (a register containing message id which should be used as an index to a jump table, along with parameters passed in other registers) which ultimately get passed to the device driver, but may be filtered along the way as messages travel through the stdio chain. These message ids are roughly split into groups. The first group is basic i/o, another group direcory managemnt, another socket-related. The device driver can choose to implement whatever makes sense and get some stdio code to emulate others (if block i/o is not supported, a stdio library function can emulate it with chartacter i/o). The device driver gets the message and a pointer to a file structure that identifies the specific file being operated on and may contain data the device driver has stored there (eg file pointer, buffer pointer, etc). The driver should do a jump table lookup for the message id and jump to the relevant function to handle the message.

    About the stdio chain.. As always there is a high level interface to a device driver, embodied in FILE* and a low level direct interface to the driver embodied in an integer file descriptor. Both the FILE* structre and the FD structure contain a bit of asm at the top that either calls or jumps to the next driver function in a chain. Normally the FILE* structure and the FD structure will point directly at a device driver, but filters can be inserted in between. Eg, FILE* may have a buffering filter inserted. Or FD might get a compressor / decompressor filter inserted. Other filters may include character set translation (think zx81), etc. So it would be possible, on the fly, for the stdio chain to take care of a lot of interesting details.

    This is how such a thing might work:

    a low level file is opened on a device using open(), returning an fd
    a filter fd is created using filter() that forwards messages to this new file
    a high level interface is created on the filter fd using fdopen()

    printf() through the high level FILE* returned by fdopen would forward character output to the filter fd, which would forward transformed output to the low level file which is connected to the device driver. Closing the original low level file removes the FD struct from the fd table but does not close the file until the filter fd or FILE* is closed (reference counting is used to ensure this).

    Anyway this is the way things are shaping up. Sorry for the length :)
  • edited June 2009
    Yep, that's a nice summary - cheers AA. I'd homed in on z88dk as the platform that would make my idea work (because object-code linking is an essential part of it all) and launched into building (and announcing) it, before discovering that the z88dk community was very much heading in this direction already. Which is a good thing, because it's a wheel I can avoid reinventing, but it does mean there's a lot more careful planning I need to do at the outset to make it all fit together...
    The startup code (the bit of code that runs prior to main) contains an optional user-specified table of letter and device driver address associations.

    Just to be clear, would this also permit letters to be associated with device drivers during runtime too? I think this would be a necessary part of dealing with PC filesystems - for example, if you were writing a DivIDE firmware that supported multiple filesystems - FAT16, FAT32 and EXT2 say - you wouldn't know which drivers to attach to which drive letters until your program had started. And it'd also enable more exotic things, such as opening an .iso image (or a .trd or .tap file, or whatever else could usefully be considered as a navigable filesystem) as a drive letter of its own. (Originally I was imagining a UNIX-style system of mount points for this, but that might be overkill - on one hand you're not limited to 26 (or however many) drives that way, but on the other hand the faff of managing a dynamic table of paths rather than a fixed-length table might just not be worth it.)
    gasman's block read and display code might look something like this:
    Yup, that's all well and good. Just one proviso though - are you suggesting that the device drivers built on top of this (FAT and so on) should talk to the lower-level device through the file I/O API? If so, I'm not really sold on that idea... it seems like an unnecessary abstraction, because if the FAT driver knows that it needs to grab a directory listing from sector 100 of the disk, it shouldn't have to translate that to 'seek to position 51200 and read 512 bytes' just for the DivIDE driver to translate it back again. For that reason, I envisaged having block devices as a separate construct from file I/O devices. It's possible that I'm underestimating the flexibility of the file I/O API, though, and that it really is the best of both worlds.

    And certainly, I see no problem in making the raw disk available as an I/O stream for the applications that *would* benefit from that - backing up a full disk image over a network, for example.

    (Happy to take this discussion back to the z88dk forums if you feel it's more appropriate there, by the way!)
  • edited June 2009
    I guess I can stop telling people about this thread now. :-p
Sign In or Register to comment.