Database model ZXDB

Recently, I have been trying to figure out how to provide my contribution to whoever takes over WoS archive rework next... So I came up with this:


ZXDB is a generic database model, that I planned for compatibility with the WoS archive. In practice, it's a database definition that incorporates characteristics from different archives (you will notice ideas influenced by MobyGames, CASA, and others). It can be used to combine and store data together from multiple sources at the same time. For instance, these table definitions allow importing all available data about every game, then store separately the review scores each game received at different websites, etc. Even so, I specifically tried to ensure the model is flexible enough to support the entire content of current WoS archive without missing any relevant information.


To be honest, I have no idea if my contribution here is really needed or desired... but frankly, it would take me too long to discuss this idea in the WoS forum, just to find out if this model would be useful. Probably even longer than the time it took me to simply model it! Now I just hope I didn't waste my time for nothing :)

A few notes:

* I didn't bother to make this model perfectly database-independent, since I bet anyone would use it in practice on a free MySQL or MariaDB database anyway. Even so, it only uses database features that can be easily ported to other databases in a couple minutes, such as the database-specific BOOLEAN datatype that I used for convenience only.

* This database model aims to balance flexibility and simplicity. I'm sure there are ways to improve it, but if this database is going to be used for the new WoS archive, I advise importing current archive data first. Afterwards, reorganizing the data into a modified database structure would be a lot easier.

* I didn't have access to WoS database definition, so I modeled everything from scratch. However, I'm convinced everything on Infoseek can be represented properly in this new database, after spending a couple hours navigating on the WoS archive looking at many different entries. Obviously I may be wrong, so (constructive) criticisms and suggestions are welcome!

* The first few tables in this model will just contain generic data (countries, idioms, websites, platforms, tape loading schemes, etc) to be referenced by other tables. I have already sorted out their (suggested) content, and I will post this script tomorrow.

* Just to make it clear, I'm not planning to implement any WoS replacement myself! I'm merely providing a database model that others can use. Based on previous forum posts about new WoS development, it seems the old WoS "legacy" database is quite messy. By providing a new "clean" database, I'm hoping to reduce the burden on whoever assumes responsibility over WoS development afterwards.

The database script is available here. I just hope others (particularly WoS management team or whoever may continue to develop this site) will find it useful. Feel free to use it!
Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
Thanked by 1mik3d3nch
«13456730

Comments

  • On a related note, and purely to understand the database application better, would WoS database be better off using NoSQL instead of a traditional RDBMS structure? Note that I'm not suggesting any such move, just want to know a typical use case where NoSQL or RDBMS makes sense over the other, and using WoS database as an example as it's easier to relate to.
  • ZXDB is a generic database model, that I planned for compatibility with the WoS archive.

    For those not in the know is this potentially a new WoS front end then? IF so - Wowsers!

    How would new game uploads fit into this?


  • Lots of stuff...

    That's pretty much the database plan in place - so well done... it shows I was doing the right thing!
    Importing isn't massively easy - there is some cleansing to be done - and there was a plan in place for the community to be involved with data checking.

    I can see some tables missing or not needed - depends on how it's built. For example I was adding an entry for the RZX Youtube link to titles so they could be shown on the page, splitting people so that individual roles can be shown/searched. Want to know who did that great music? No problem - want to see what other music they did? No problem...

    We were also adding developer notes to titles, and including files - a lot of developers have shown an interest in adding these notes and drawings for archive purposes.

    Inlays/screenshots can be added either as a directory or db entry. I have done publisher logos as a directory, so would have probably carried that on with screenshots and inlays.

    Where you have downloads as a table - I have files. There is also a directory structure in the database that the files link to - so it's a virtual drive. When you view a publisher on the dev site at the moment - all of their files have the details pulled from the database first - so more information can be stored, and the filenames are encrypted as a level to stop leeching.
    R-Tape wrote: »
    For those not in the know is this potentially a new WoS front end then? IF so - Wowsers!

    Not sure what you mean there - he's only put up a database layout.
    My test signature
  • edited July 2016
    Lots of stuff...

    That's pretty much the database plan in place - so well done... it shows I was doing the right thing!
    Importing isn't massively easy - there is some cleansing to be done - and there was a plan in place for the community to be involved with data checking.

    I can see some tables missing or not needed - depends on how it's built. For example I was adding an entry for the RZX Youtube link to titles so they could be shown on the page, splitting people so that individual roles can be shown/searched. Want to know who did that great music? No problem - want to see what other music they did? No problem...

    We were also adding developer notes to titles, and including files - a lot of developers have shown an interest in adding these notes and drawings for archive purposes.

    Inlays/screenshots can be added either as a directory or db entry. I have done publisher logos as a directory, so would have probably carried that on with screenshots and inlays.

    Where you have downloads as a table - I have files. There is also a directory structure in the database that the files link to - so it's a virtual drive. When you view a publisher on the dev site at the moment - all of their files have the details pulled from the database first - so more information can be stored, and the filenames are encrypted as a level to stop leeching.

    Any plans to release the current WoS database as CSV files or similar to the community before you move on so others can clean it up? I have been working on something similar also in mysql and have the magazine's database and the API export from WoS but not much else. I assume there is a table adding the infoseek ID to the magazine's database for instance. Seems mad having to recreate from the JSON feed.

    Post edited by jonesypeter on

  • Any plans to release the current WoS database as CSV files or similar to the community before you move on? I have been working on something similar also in mysql and have the magazine's database and the API export from WoS but not much else. I assume there is a table adding the infoseek ID to the magazine's database for instance. Seems made having to recreate from the JSON feed.

    Currently the only database tables we have are the publishers as I have been concentrating on the core modules such as user permissions, notifications of changes, change requests, etc - all of the modules that will be re-used.

    There is no real database for the magazines - the old WoS uses the SPOT database, but this is no longer maintained. We were therefore planning to manage these references ourselves and index all of the magazines and pages so more info can be searched - the big advantage being the database could store text versions of the pages. Same goes for books.
    My test signature
    Thanked by 1jonesypeter
  • edited July 2016
    Noise
    Post edited by hikoki on
  • R-Tape wrote: »
    For those not in the know is this potentially a new WoS front end then?

    It's actually the "back-end part" of a new front-end.

    In a nutshell, providing a new WoS front-end would take these steps:

    1.) Design a new database model;

    2.) Import current WoS data to populate this database;

    3.) Write SQL queries to access this data; ***

    4.) Implement a front-end to exhibit query results as HTML pages, and also search forms to receive user input.

    I just provided step 1 above. I'm offering to also provide steps 2 and 3, then someone else will need to work on step 4.

    R-Tape wrote: »
    How would new game uploads fit into this?

    Supporting archive updates again would take yet another step (although it could be done in parallel):

    5.) Implement a submit form to receive archive update information from administrators, then execute SQL statements to insert them into database.


    *** NOTE: Since the actual meta-data should be small enough to fit into memory, there's really no need for SQL queries. At web application start, a simple initialization code could load all data from database into a proper data structure in memory, afterwards every database search would just need to traverse this data. Therefore the application would become considerably simpler and faster. I'm also willing to implement this data structure and the corresponding initialization code in Java, instead of SQL queries, if anyone's interested. This is my recommendation for the new WoS site, but this decision lies on whoever will take responsibility for developing the front-end...
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016

    Any plans to release the current WoS database as CSV files or similar to the community before you move on? I have been working on something similar also in mysql and have the magazine's database and the API export from WoS but not much else. I assume there is a table adding the infoseek ID to the magazine's database for instance. Seems made having to recreate from the JSON feed.

    Currently the only database tables we have are the publishers as I have been concentrating on the core modules such as user permissions, notifications of changes, change requests, etc - all of the modules that will be re-used.

    There is no real database for the magazines - the old WoS uses the SPOT database, but this is no longer maintained. We were therefore planning to manage these references ourselves and index all of the magazines and pages so more info can be searched - the big advantage being the database could store text versions of the pages. Same goes for books.

    Thank Lee. I was requesting the data tables from the current site rather than your new super duper one. I have the Spot database in mysql already and the flat file with the basic infoseek info, but nothing to join them. Thanks
    Post edited by jonesypeter on
    Thanked by 1ingo
  • I get the feeling that WoS traffic is being vastly under estimated here....
    My test signature
  • I have a Raspberry Pi 3. Would that not be enough☺
  • I have a Raspberry Pi 3. Would that not be enough☺

    It'd certainly stop the scrapers and leechers!

    My test signature
  • I have a Raspberry Pi 3. Would that not be enough☺

    It'd certainly stop the scrapers and leechers!

    It would that!

  • The thinking is roughly the same as mine - however I had written a cache module so the DB search results are all stored in a folder - makes it easy to delete them when there is a change too. Every module is separated, so individual caches can be deleted via the admin interface.
    My test signature
  • That's pretty much the database plan in place - so well done... it shows I was doing the right thing!

    Cool, I'm glad this database model will be useful!

    Importing isn't massively easy - there is some cleansing to be done

    I thought so.

    But no problem! If you send me whatever data you have, in whatever format, I will have fun sorting it out. :)

    I can see some tables missing or not needed - depends on how it's built. For example I was adding an entry for the RZX Youtube link to titles so they could be shown on the page

    Good idea! I will keep it in mind for future improvements.

    However, I really think we should preserve current data first, and get a fully working WoS archive (including archive updates), then gradually add new features later. Otherwise it will never get finished.

    Even so, notice that current database model is already generic enough to store all kinds of files in table "downloads", including RZX files. The idea is that filetypes "loading screen" or "in-game screen" should be displayed at top right corner of page, filetypes TAP, TZX or Z80 should be displayed inside section "Download and play links", remaining filetypes should be displayed inside section "Additional material", perhaps filetype RZX could be displayed separately. There's no support for external video links yet, but it would be easy to add later.

    splitting people so that individual roles can be shown/searched. Want to know who did that great music? No problem - want to see what other music they did? No problem...

    Already supported! Take a look at table "roletypes".

    Each author can have multiple roles in the development of each program.

    I just uploaded a second script that populates these generic tables. It's available at the same link here. It will be easier to understand the database model after looking at this new script.

    We were also adding developer notes to titles, and including files - a lot of developers have shown an interest in adding these notes and drawings for archive purposes.

    Good idea!

    There's no need to change the database model, you just have to store links to these files in table "downloads".

    I will insert one more entry to classify "Development sketches/notes" in table "filetypes". This information deserves a different category.

    Inlays/screenshots can be added either as a directory or db entry. I have done publisher logos as a directory, so would have probably carried that on with screenshots and inlays.

    Exactly!

    The database should just contain meta-data information, not the actual files.

    Where you have downloads as a table - I have files. There is also a directory structure in the database that the files link to - so it's a virtual drive.

    Perhaps you misunderstood how this table works?

    Table "downloads" contain a link for the actual location of each file. These files are not stored inside the table.

    Or perhaps I misunderstood what you mean here?

    When you view a publisher on the dev site at the moment - all of their files have the details pulled from the database first - so more information can be stored

    I cannot see how your description differs from this database model...???

    and the filenames are encrypted as a level to stop leeching.

    If you want to protect file references through encryption or any other method, that's fine. But in this case, it's better to scramble this information in the server interface only. Not inside the database or file system.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • The folders table is index linked. The files table has loads of file info - and the folder ID.
    If you look here:- dev.worldofspectrum.org/infoseek/publishers/view/oceansoftwareltd

    You'll see that the filenames are encrypted. But when you download them, they are the correct name. All of the extra details like image dimensions, filetypes, size, etc are stored in the database when they are uploaded. The files module also counts each download prior to sending it.

    The files are physically stored outside of the public_html so they can't be mass leeched.

    You also need to consider.... there are a lot of infoseek changes, and Martijn was doing 30-40hr weeks, and still taking holidays to catch up - so updates are constant. Won't that make storing in memory defunct?

    Also, you will need several admins - they will need an extremely easy interface aimed at no technical knowledge. At the moment on the dev site, they have a list of changes they have permissions for, and just click approve or decline. This would put a verify button on the front end for users with verify permissions to double check the data.
    My test signature
  • The thinking is roughly the same as mine - however I had written a cache module so the DB search results are all stored in a folder - makes it easy to delete them when there is a change too. Every module is separated, so individual caches can be deleted via the admin interface.

    Yes, it was the same thinking! Although I had in mind a different approach.

    Instead of caching query results, I had in mind to eliminate all queries, by keeping all metadata in memory and rendering pages directly from it.

    Instead of a custom cache module on disk, I had in mind to simply enable content cache in Apache HTTP Server, in order to avoid rendering the same page more than once.

    The server could re-read entire archive database only once per day in background, then replace entire data structure in memory at once with a single atomic operation without anybody noticing. The cache could be cleaned once per day, shortly afterwards. Nobody is going to complain if the WoS archive (only) takes a day to reflect archive updates (although this interval could be shortened depending on server capacity).
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • The folders table is index linked. The files table has loads of file info - and the folder ID.
    If you look here:- dev.worldofspectrum.org/infoseek/publishers/view/oceansoftwareltd

    You'll see that the filenames are encrypted. But when you download them, they are the correct name.

    OK. But there's no need to encrypt links in database or file system. Only in your interface.

    For instance: once per day, generate a random encryption key. Display HTML pages with file links encrypted with this key. Whenever there's a download request, decrypt link information with the same key. Keep the last 2 encryption keys. This way, you will be serving links valid for 24h to 48h only, without the need to encrypt your internal database or file system.

    All of the extra details like image dimensions, filetypes, size, etc are stored in the database when they are uploaded. The files module also counts each download prior to sending it.

    The files are physically stored outside of the public_html so they can't be mass leeched.

    OK, no problem.

    You also need to consider.... there are a lot of infoseek changes, and Martijn was doing 30-40hr weeks, and still taking holidays to catch up - so updates are constant. Won't that make storing in memory defunct?

    Sure, but that's OK!

    I doubt anybody would care if the WoS archive took a day to reflect archive updates.

    Also, you will need several admins - they will need an extremely easy interface aimed at no technical knowledge. At the moment on the dev site, they have a list of changes they have permissions for, and just click approve or decline. This would put a verify button on the front end for users with verify permissions to double check the data.

    Ideally, yes. Providing "maker-checker" support would be the perfect solution...

    However, there's no need to implement a perfect solution from scratch. It's so much easier to make the system work first, then improve it gradually later.

    Here's a much simpler multiple-step plan:

    1. Implement a simple form to execute "archive updates", only accessible by admins.

    2. Add a "new submission" form for new archive entries only, just like old WoS. Actual database inserts would be performed only after admin approvals. This form would automate adding new titles, but all other updates to existing titles would still require direct work from admins.

    3. Add properly "human readable" logs for archive updates, so anyone could easily audit changes performed by others.

    4. Expand access to "archive updates" to a few reliable users, besides admins. Although

    5. Finally implement "maker-checker" support.

    This way, process efficiency will improve over time, but it won't be needed to wait for a long time until the perfect solution is implemented before obtaining some results.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Depends how you code. I do it modular and make the code re-usable. By spending 12 months getting these modules done & right makes it easier to do the others. Adding a flag for "verified" is a 2 second job that might as well be done from the start - so that the imported data is checked before the old site goes offline.

    Similarly, the update & additions form are the same module, so easy to get them copied.

    You do realise I have already done most of what you are suggesting?
    My test signature
  • You do realise I have already done most of what you are suggesting?

    No, That's even better!

    I was working under the assumption that whoever took WoS development would have to start all over again. Perhaps you could post somewhere a description of the work already finished, and what still needs to be done, in order to avoid unnecessary rework. Also technical details such as environment, programming language, frameworks and technologies required to continue this work.

    Regardless of this, you mentioned that database importing still needs to be done, and it won't be easy. Do you mind if I go ahead and do it? Even if whoever takes over WoS development decides to adopt a different database structure, converting data will be much easier after everything's properly imported into this ZXDS model. Could you please send me whatever data you have?
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • * The first few tables in this model will just contain generic data (countries, idioms, websites, platforms, tape loading schemes, etc) to be referenced by other tables. I have already sorted out their (suggested) content, and I will post this script tomorrow.

    The database script is available here. I just hope others (particularly WoS management team or whoever may continue to develop this site) will find it useful. Feel free to use it!

    Nice!

    Any reason your countries have custom ids instead of iso country codes? The same applies to idioms.
  • edited July 2016
    Stefan wrote: »
    Any reason your countries have custom ids instead of iso country codes? The same applies to idioms.

    Because I kept internal WoS codes, in order to make it easier to import WoS data.

    Once everything's imported into the new database structure properly, I will replace these IDs with proper numeric values (ISO numeric codes).

    How did I find out the internal WoS codes? Looking at the page source of Advanced Search at Infoseek :)
    Post edited by Einar Saukas on
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016
    Stefan wrote: »
    Any reason your countries have custom ids instead of iso country codes? The same applies to idioms.

    Because I kept internal WoS codes, in order to make it easier to import WoS data.

    Once everything's imported into the new database structure properly, I will replace these IDs with proper numeric values (ISO numeric codes).

    How did I find out the internal WoS codes? Looking at the page source of Advanced Search at Infoseek :)

    You still can't do much without the data unless you re-enter it from scratch. Or pull it from the JSON extract. Or unless it is released which I don't think is going to happen.

    Post edited by jonesypeter on
  • I won't be releasing my scripts - some of them form a part of my business unfortunately.

    Whilst you are doing the new site, don't forget you will also need to maintain the old site - that needs a dedicated server, and the scripts are compiled on the server. Then you will have to put up with complaints about updates to infoseek - which you will have to do. Or bring it up & get the files on whilst you build the replacement.

    You can't use a shared host - even one with "unlimited" hosting as they are never unlimited - and you need to be able to SSH to the server to run updates and get the site going.

    You'll also need plenty of space - Gerard has mailed me a hard drive this week just to put the magazines on as they are 200gb. You can't really use the "I haz teh WoS" folders online as there is loads missing.

    Having said that - I have added Einar as a temp admin to the new site to see how it works - I am sure in the next few weeks we can get the database sorted at least.
    My test signature
  • I won't be releasing my scripts - some of them form a part of my business unfortunately.

    Whilst you are doing the new site, don't forget you will also need to maintain the old site - that needs a dedicated server, and the scripts are compiled on the server. Then you will have to put up with complaints about updates to infoseek - which you will have to do. Or bring it up & get the files on whilst you build the replacement.

    You can't use a shared host - even one with "unlimited" hosting as they are never unlimited - and you need to be able to SSH to the server to run updates and get the site going.

    You'll also need plenty of space - Gerard has mailed me a hard drive this week just to put the magazines on as they are 200gb. You can't really use the "I haz teh WoS" folders online as there is loads missing.

    These are good points. Whoever takes over WoS site development must not underestimate how much work is involved.

    Just to make it clear, my contribution will be sorting out the database, then assisting with SQL queries (or alternatively an initialized data structure in Java) to reduce the burden on anyone willing to continue WoS development.

    I'm already taking a big enough task, there's no chance I can be convinced to do anything else...

    Having said that - I have added Einar as a temp admin to the new site to see how it works - I am sure in the next few weeks we can get the database sorted at least.

    Thanks! I will take a look later tonight!!!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Stefan wrote: »
    Any reason your countries have custom ids instead of iso country codes? The same applies to idioms.

    Hmm... On second thought, I could index tables using standard IDs, and just store WoS codes in temporary columns to be removed later. OK, I just uploaded these changes. Thanks for bringing up this issue!!!

    I also uploaded a 3rd script I put together today, containing a list of magazine names (obtained from multiple sources including the old WoS ftp site) and some info about competitions (copy/pasted directly from Infoseek pages). It's nice to see some actual data inside the database :)
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Because I kept internal WoS codes, in order to make it easier to import WoS data.

    Once everything's imported into the new database structure properly, I will replace these IDs with proper numeric values (ISO numeric codes).

    How did I find out the internal WoS codes? Looking at the page source of Advanced Search at Infoseek :)

    I suspected as much but didn't see the country codes used after a very quick look at a single infoseek page. Sorry for suggesting that you would have used proprietary codes in 2016, I should have known better. :-)
  • Hmm... On second thought, I could index tables using standard IDs, and just store WoS codes in temporary columns to be removed later. OK, I just uploaded these changes. Thanks for bringing up this issue!!!

    I also uploaded a 3rd script I put together today, containing a list of magazine names (obtained from multiple sources including the old WoS ftp site) and some info about competitions (copy/pasted directly from Infoseek pages). It's nice to see some actual data inside the database :)

    Nice! I would use iso two character country codes instead of numbers.
  • Stefan wrote: »
    Nice! I would use iso two character country codes instead of numbers.

    I'm considering numeric codes for countries and two character codes for idioms should help me to avoid mixing them by mistake, when messing directly with SQL to import unorganized WoS data. Then I may consider changing it afterwards :)
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016
    For countries and idioms I would suggest that you use a generated ID to be used by Foreign Keys, it's faster and works better (smaller indexes).

    As soon as convenient, extra fields like ISO2, ISO3 or others, should be defined, so that you can reference them by some easy human readable form, if not there from start.

    For them to work as expected, they only need to have Unique Indexes on each extra column like "ISO2", since they must act like a Table Primary Key.
    For added extra column keys, mark them as NOT NULL, after all records have been filled with extra column data.

    I'll have a look at the schema, since that's something I do almost everyday.

    Is this really going down ?

    @Lee Fogarty are you really stepping out ?
    I was kind of hopping that you would review your decision.

    But if you are really going for it, than you should "document" how you expect to do this handover.

    Also WOS is not only archive, but also forums.
    What can we expect on this front ?
    Post edited by RMartins on
  • RMartins wrote: »
    For countries and idioms I would suggest that you use a generated ID to be used by Foreign Keys, it's faster and works better (smaller indexes).

    As soon as convenient, extra fields like ISO2, ISO3 or others, should be defined, so that you can reference them by some easy human readable form, if not there from start.

    I suspect using TINYINT here won't make any noticeable difference in the overall query execution time. If it does, I will certainly change this later. Notice that, if ISO codes won't be used as keys, there's no need to store them in these tables at all. They will never be used anywhere else.

    However you just reminded me I originally planned to review datatypes after populating other "enumeration" tables, but I forgot to do so. Now I have just uploaded another version. Thanks!

    RMartins wrote: »
    For them to work as expected, they only need to have Unique Indexes on each extra column like "ISO2", since they must act like a Table Primary Key.
    For added extra column keys, mark them as NOT NULL, after all records have been filled with extra column data.

    I know :)

    RMartins wrote: »
    I'll have a look at the schema, since that's something I do almost everyday.

    Thanks!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
Sign In or Register to comment.