Database model ZXDB

2456730

Comments

  • I won't be releasing my scripts - some of them form a part of my business unfortunately.

    Does that apply to data from the old site too? You mentioned it would need a clean up, but as you mentioned the community could do this. Thanks

  • The old site scripts won't be made generally available as some may contain confidential information - and they still technically belong to Martijn - so they will be handed to whoever takes over to restore the site as it is.

    With the data clean up, there was a plan for the new site to have an admin permission that just allows data verification, so initially with the import everything needs verifying, then when the site is live, every change an admin makes can be verified. Being the only permission, there can be loads of community members allowed to verify data without risking database problems.
    My test signature
  • Thanks for taking the time to answer Lee. I understand now and won't ask again.
  • Einar, some time ago you recommended a java framework which I don't remember what it was called. Please can you refresh our memory?
    Btw it's great these two beasts Mr. Lee and Mr. Einar are going to work together
  • hikoki wrote: »
    Einar, some time ago you recommended a java framework which I don't remember what it was called. Please can you refresh our memory?
    Btw it's great these two beasts Mr. Lee and Mr. Einar are going to work together
    You've been called worse names than that Lee!!!!
    I wanna tell you a story 'bout a woman I know...
  • edited July 2016
    hikoki wrote: »
    Einar, some time ago you recommended a java framework which I don't remember what it was called. Please can you refresh our memory?

    I recommended using OpenXava. I still believe using this framework, combined with the idea of loading entire database into Java classes in memory, and enabling content cache in Apache, could produce a high performance solution with minimal effort.

    However it's just my suggestion. Whoever takes over WoS development must make this decision, not me.

    hikoki wrote: »
    Btw it's great these two beasts Mr. Lee and Mr. Einar are going to work together

    Thanks, I suppose :)
    Post edited by Einar Saukas on
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
    Thanked by 1hikoki
  • edited July 2016
    RMartins wrote: »
    For countries and idioms I would suggest that you use a generated ID to be used by Foreign Keys, it's faster and works better (smaller indexes).

    As soon as convenient, extra fields like ISO2, ISO3 or others, should be defined, so that you can reference them by some easy human readable form, if not there from start.

    I suspect using TINYINT here won't make any noticeable difference in the overall query execution time. If it does, I will certainly change this later. Notice that, if ISO codes won't be used as keys, there's no need to store them in these tables at all. They will never be used anywhere else.

    Since these are reference tables, a couple of bytes per entry to make them more human friendly is always good.

    They can be used to make setup SQL auto documented, and to provide valid reference keys, without being Primary Keys.

    For example:
    Lets suppose we provide a way to change site language, where you list a bunch of supported languages, in an HTML SELECT input for example.

    It's bad to show/share Primary Key as the HTML OPTION "value", instead of a human readable code.
    <select>
      <option value="23">English</option>
      <option value="57">Portuguese</option>
      <option value="42">Spanish</option>
    </select>
    

    This next one is a lot better.
    <select>
      <option value="en">English</option>
      <option value="pt">Portuguese</option>
      <option value="es">Spanish</option>
    </select>
    

    It doesn't open the inner working codes to prying eyes, and is human readable, which makes it a lot less prone to errors.

    Here is an example for better SQL initialization scripts.
    INSERT INTO download
       ( id, program_id, idiom_id, filename, ... )
    VALUES
       (  1234, 69999, 57, "SteelBall.dat", ... )
    

    This next one is a lot better
    INSERT INTO download
       ( id, program_id, idiom_id, filename, ... )
    VALUES
    (
       download_seq.NEXTVAL, -- this is Oracle Syntax, for MySQL use AUTO_INCREMENT, and do not declare this field.
       (SELECT id FROM program WHERE title="SteelBall"),
       (SELECT id FROM idiom WHERE iso2 ="pt" ),
        ...
    )
    

    Self documented by being directly human readable, without using obscure/opaque IDs.
    NOTE: Some care as to be taken, to use columns that are Unique, so that they can be used as a public primary key.

    Also having public "keys" and not IDs, helps prevent problems when declaring APIs for further integration with external systems.

    Making APIs requests (REST, etc..) with internal IDs is big NO-NO for portability and overall data management.
    It just violates Object Oriented (Black Box) Encapsulation principle.

    Example:
    GET /WOS/program/list?lang=23
    GET /WOS/program/list?lang=en

    How do you publically publish the language IDs ?
    Add another API to list Languages, so that you can get IDs ?

    Or just declare that Language codes use some standard like ISO 3166 (countries) and ISO 639 (languages).

    NOTES:
    Something I noticed, is that some tables are using the plural form of the name, which is generally not considered good practice, since it doesn't bring any advantage in practice, but usually generates some trouble with problematic english plurals (person/people, etc...).

    A table is a just a container for records of some type, like "program", not "programs", so every table name should be singular, since it can be seen as a name of a type (it's not explicitly a name of a collection) .

    Some might debate that SELECT * FROM programs ... feels more natural, but the accepted good practice convention for SQL declarations/definitions is to always use singular names.

    But it's just a suggestion.


    Post edited by RMartins on
  • Thanks a lot for your schema review and comments! I really appreciate it.

    In general, ZXDB database model is based on the following assumptions:

    * I'm assuming MySQL or MariaDB are more likely to be adopted in practice. Therefore using SEQUENCE is out of question.

    * I'm planning to reuse the same internal keys from WoS Infoseek, because they should make my life easier when importing data from WoS. Therefore, I will rarely need AUTO_INCREMENT keys. In current scripts, IDs for machinetypes, turntypes, featuretypes, competitions were all taken from WoS already. Other keys will be updated as soon as I find out their internal values at WoS. Even so, I'm already aware I could use AUTO_INCREMENT, I will take advantage of it if needed.

    * I won't bother to produce very "human readable" scripts for populating the database. After importing all data from WoS Infoseek, I will simply produce a "clean" database export instead. The reason is, while importing WoS content or afterwards, I'm likely to apply further improvements to the database schema, then export all database content again. Therefore, any effort applied on producing "human readable" scripts will be probably wasted a few times. I will only worry about keeping the schema definition readable, but not other scripts. Even so,

    * I won't worry about "public keys" for APIs either. Whoever uses ZXDB to build a new server later will be in a better position to identify where it will be really needed, and easily add it themselves. I'm currently focused on building a database import, not any underlying system.

    * About plurals for table names, I know I'm not following typical naming conventions... but I thought using somewhat "informal" names in ZXDB would be fun anyway so I'm sticking to it. Sorry! :)
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • There's no hard and fast rules when it comes to table naming convention when it comes to plurals. Oracle, which is my background, suggest plurals for table names whereas I think Microsoft suggest the opposite. I think it's all down to personal preference.
  • edited July 2016
    redballoon wrote: »
    There's no hard and fast rules when it comes to table naming convention when it comes to plurals. Oracle, which is my background, suggest plurals for table names whereas I think Microsoft suggest the opposite. I think it's all down to personal preference.

    Really? I worked on software projects using Oracle most of the time, and I never saw DBAs using plural table names.

    Well, there goes my attempt to do something innovative :)
    Post edited by Einar Saukas on
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016
    Good to see things moving. I like what I see on the "dev" site. :-)

    Keep in mind that any site maintained as a hobbie must be VERY simple.

    Better limit features to the core ones and get/keep those working.

    For WOS I'd say the main feature is the archive data (the downloads are nice to have but we live without them) and updates. So it's good to see the efforts concentrated there.
    Post edited by 4throck on
  • edited July 2016
    Thanks a lot for your schema review and comments!

    I just took a quick peek at it ;)

    * I won't bother to produce very "human readable" scripts ...
    I will only worry about keeping the schema definition readable, but not other scripts.

    My comments were specifically for reference tables only.
    These should be well documented regardless of possible DB dumps in future.

    When I have some free time I'll produce a graphic representation.
    Post edited by RMartins on
  • OK, I just uploaded here the first few megabytes of WoS database conversion. Thanks a lot to Lee Fogarty, Peter Jones, and especially Gerard Sweeney for their help on this task!!!

    There's still a lot more to do, so this is going to take a while...

    Out of curiosity, it turns out the WoS archive has quite a few inconsistencies that nobody ever noticed. For instance, try clicking on the second link of this page:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^Drazen+Petrovic$

    or clicking on some of the titles in this page:

    http://www.worldofspectrum.org/infoseekid.cgi?id=0011149

    These are minor problems that only affect obscure titles nobody cares about. However I'm having to investigate and fix each one of them, otherwise information would be lost during conversion, or database consistency wouldn't allow importing the data. So this work will take quite a bit longer than the time to transform several disjoint files into a single database... but it will be worth it! :)
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
    Thanked by 1Dave_C
  • edited July 2016
    Good work.

    But I have a question.

    If I got it right, there are:
    1 - a DB (old and unsupported vendor) from Martijn,
    2 - a DB (MySql) from Fogarty Lee, supposedly a conversion (partial?) from 1
    3 - a new DB (MySql) being defined now, based on, or an import of 2

    Besides inconsistencies, that should be cleared, if there is a database with the data, why is there a need for a new definition ?

    Even if we are expanding what will be supported, it could always by done by expanding over whatever already exists.

    What is the main reason, for redefinition ?
    Does the current DB, have problems, that need fixing ?

    It just seems we are re-doing the same work again.
    Are we ?
    Post edited by RMartins on
  • RMartins wrote: »
    Good work.

    Thanks!

    RMartins wrote: »
    But I have a question.

    Sure... ?

    RMartins wrote: »
    If I got it right, there are:
    1 - a DB (old and unsupported vendor) from Martijn,

    Nope.

    The WoS archive never had a database, Everything's stored in a collection of disjoint files, without integrity check. Apparently they were parsed, cross-referenced and converted directly to HTML using a proprietary system running on a commercial license that was restricted to a single server that has died some time ago AFAIK.

    RMartins wrote: »
    2 - a DB (MySql) from Fogarty Lee, supposedly a conversion (partial?) from 1

    Nope.

    He had plans to do something similar, but didn't have the actual database tables.

    RMartins wrote: »
    3 - a new DB (MySql) being definid now, based on 2

    Nope.

    Neither database 1 or 2 exists. And I modeled ZXDB from scratch.

    However I have now access to the "disjoint" files from WoS archive and I'm working to turn them into a real database (with help from some people I just mentioned).

    RMartins wrote: »
    Besides inconsistencies, that should be cleared, if there is a database with the data, why is there a need for a new definition ?

    No, there isn't a database with the data.

    RMartins wrote: »
    Even if we are expanding what will be supported, it could always by done by expanding over whatever already exists.

    Nothing already exists...

    RMartins wrote: »
    What is the main reason, for redefinition ?
    Does the current DB, have problems, that need fixing ?

    Current DB's most serious problem is its lack of existence :)

    RMartins wrote: »
    It just seems we are re-doing the same work again.
    Are we ?

    I doubt it!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016
    I new I saw DB written somewhere :)
    ... the old WoS uses the SPOT database, but this is no longer maintained. ...

    Thanks for explanation @Einar Saukas.

    But then how was @Fogerty Lee, populating Archive pages, and supporting search ?
    Reading these disjoint files ?
    is the old system still working, after all ?

    I had the impression that these things were already working.
    But I'm probably lacking information again.

    Keep up the great work.

    I'll generate the graphical schema, as soon as I can.
    And I'm available to help some more.
    Post edited by RMartins on
  • Off Topic: How do we make a reference to a user that has spaces in his name ?
  • Had a quick peak at the schema. Few things I'd definitely suggest:

    Avoid calling everything ID. It's easy, but down the line rapidly becomes difficult to work with. Even just going with a convention like TableNameID is much less error prone.

    Avoid "magic" numbers, such as the players value in the programs table. Instead make a lookup table which details what the values 0,1,2,3 etc mean and a FK constraint to enforce consistency. Even if you don't technically "need" the table for things to work, it's much more self-documenting and makes it much easier to produce a meaningful data export down the line (avoiding the situation where someone assume 3 means 3 players, for example)

    Think long and hard about whether you really want Nullable VarChar columns. Unless there really is a meaningful distinction between a NULL value and an empty string, you want them NOT NULL. There is almost no situation with a meaningful distinction. There are a few other cases were FK columns are nullable, but it might not be the ideal representation - for example the author => roletype relationship is nullable, presumably to mean "don't know", but querying data can get cumbersome with that and you may be better off with a roletype of "Unspecified" and an enforced relationship.

    Generally I'd recommend camelCase or PascalCase names rather than using underscores, particularly as the don't seem to be used consistently.

    And, whilst it's not a massive deal since they're pretty much a one off thing, I'd have avoided scripting out the data population scripts as quite so many INSERT VALUES scripts because all those sub-selects are cumbersome, slow and a PITA to debug. An INSERT SELECT with appropriate joins is so much easier (especially when debugging as you can run the SELECT part independently, which you can't with a VALUES clause)
  • AndyC wrote: »
    ...
    Avoid calling everything ID. It's easy, but down the line rapidly becomes difficult to work with. Even just going with a convention like TableNameID is much less error prone.

    It's a common accepted convention, to name primary Keys just "id" and foreign keys with "{table}_id" and it's not error prone at all, since there is always the table "context".

    It works well in code and in SQL.
    But I would be thankful if someone showed a situation where this doesn't hold true. Always eager to learn :)
    -- Using alias for example
    SELECT * FROM programs P, licenses L
    WHERE P.license_id = L.id
       AND P.name LIKE 'Steel%'
    
    AndyC wrote:
    ...
    Generally I'd recommend camelCase or PascalCase names rather than using underscores, particularly as the don't seem to be used consistently.
    ...

    It's also a common accepted convention to use "_" to separate words in DB names, and a lot of frameworks also rely on that assumption.

    Can't agree more with consistency, that's key. :)

    So what I would suggest is to use whatever convention(s) are decided, but then stick with them and enforce them, wherever possible.

  • edited July 2016
    The WoS archive never had a database, Everything's stored in a collection of disjoint files, without integrity check. Apparently they were parsed, cross-referenced and converted directly to HTML using a proprietary system running on a commercial license that was restricted to a single server that has died some time ago AFAIK.

    Can I ask a question? A while ago some loudmouths very vocal individuals on Facebook were claiming that a WOS database and site could be built in a matter of days simply by scraping search content. Were they talking complete boollocks nonsense?
    Post edited by Dave_C on
  • Dave_C wrote: »
    Can I ask a question? A while ago some loudmouths very vocal individuals on Facebook were claiming that a WOS database and site could be built in a matter of days simply by scraping search content. Were they talking complete boollocks nonsense?

    Yes.
    My test signature
  • edited July 2016
    @Dave_C There is a JSON feed which provides all the WoS details. It was Lee says produced via scraping content. This is what I imagine the Facebook uses would have used?? However we never saw any evidence!

    http://search.kolbeck.dk/sinclair/game/0000001

    A friend of mine started development of a Javascript feed of it, but when I posted it here there didnt seem much interest so we did not take it any further and moved onto other projects.

    Kolbeck has also produced this:

    http://sinclair.kolbeck.dk/

    NickH used the JSON feed to produce his static HTML pages for his site. The example below shows this:

    http://www.ysrnry.co.uk/downloads/0002475.html



    Post edited by jonesypeter on
  • It seems strange that they felt the need to scrape the site when there has always been an infoseek.csv file available, but hey - that's people for you.

    There's no publishers equivalent, or books, or magazines, or anything else - which is easy enough to do. What nobody seems able to do is put it all together into a manageable package that's secure and allows multi-user access to different parts for maintenance. zx-mod looked like it was heading that way, but was Wordpress and seems to have been abandoned. I understand it's taking a different route to WoS now so there will be other resources on there that would be useful.

    I assume whoever scraped WoS is intending to repeatedly scrape for updates once the new site is up - my scripts don't allow that - I'm happy to hand over the principle as to how that works to whoever comes next. Scraping websites is so 1990's.
    My test signature
  • RMartins wrote: »
    It's a common accepted convention, to name primary Keys just "id" and foreign keys with "{table}_id" and it's not error prone at all, since there is always the table "context".

    It works well in code and in SQL.
    But I would be thankful if someone showed a situation where this doesn't hold true. Always eager to learn :)

    When you've got really complex queries with *lots* of joined tables, it's all too easy to mis-type say PR.Id as P.Id and not notice a subtle bug, it's a lot less easy to mistype PR.ProgramID as P.PersonID without an immediate error. Anything that moves bug detection into a compile time thing is always a good move.
    RMartins wrote: »
    It's also a common accepted convention to use "_" to separate words in DB names, and a lot of frameworks also rely on that assumption.

    Can't agree more with consistency, that's key. :)

    So what I would suggest is to use whatever convention(s) are decided, but then stick with them and enforce them, wherever possible.
    Yeah, one or the other works. They reason I don't like _ to separate words (apart from looking ugly) is that _ tends to be used when defining constraint names with conventions like FK_Table1Name_Table2Name_Column and that tends to become difficult to read once you throw in extra _'s as a result of object names.
    RMartins wrote: »
    -- Using alias for example
    SELECT * FROM programs P, licenses L
    WHERE P.license_id = L.id
       AND P.name LIKE 'Steel%'
    
    Ugh. Don't use cross joins like that - it confuses Join conditions with the Where clause ad often results in worse performing queries. It's much better to be specific about what you actually want:
    SELECT *
    FROM programs P
    JOIN licenses L On P.license_id = L.id
    WHERE P.nae LIKE '%Steel%'
    
  • edited July 2016
    It seems strange that they felt the need to scrape the site when there has always been an infoseek.csv file available, but hey - that's people for you.

    @Lee can you explain how you access this .CSV and what the content is?

    If Einar's ZXDB comes to pass and someone implements it as open, then there would I assume be no need to scrape.

    Post edited by jonesypeter on
  • If Einar's ZXDB comes to pass and someone implements it as open, then there would I assume be no need to scrape.

    So, how do updates work?
    My test signature
  • edited July 2016
    Einar's database is available for download, so the community can update it. Would would be interesting is how infoseek IDs are chosen for new titles so we are consistant. I see a committee needed, and an ISO standard.

    Now, about this CSV??
    Post edited by jonesypeter on
  • Einar's database is available for download, so the community can update it. Would would be interesting is how infoseek IDs are chosen for new titles so we are consistant. I see a committee needed, and an ISO standard.

    Now, about this CSV??
    That would be entirely unworkable. Providing web APIs for querying, updating and exporting from a central database would make way more sense.
  • edited July 2016
    AndyC wrote: »
    Einar's database is available for download, so the community can update it. Would would be interesting is how infoseek IDs are chosen for new titles so we are consistant. I see a committee needed, and an ISO standard.

    Now, about this CSV??
    That would be entirely unworkable. Providing web APIs for querying, updating and exporting from a central database would make way more sense.

    I was joking about the committee..... and the CSV was linked to something Lee said was available and not related to ZXDB.

    There seems only one post about this infoseek.csv file, and nothing online. Does it realy exist???

    Post edited by jonesypeter on
  • RMartins wrote: »
    I new I saw DB written somewhere :)

    Probably from me. Whenever I wrote "WoS database" in the past, please replace it with "whatever data repository is currently active behind the WoS system interface, either a typical SQL database or any other data storage system that allows some degree of data cross-reference and search". This should make my previous posts much easier to understand :)
    RMartins wrote: »
    But then how was @Fogerty Lee, populating Archive pages, and supporting search ?
    Reading these disjoint files ?
    is the old system still working, after all ?

    AFAIK only the parts of WoS that allowed modifying and "recompiling" these disjoint files was broken, so they cannot be (easily) changed anymore. But the parts of WoS that generate web pages from them still work.
    RMartins wrote: »
    I'll generate the graphical schema, as soon as I can.
    And I'm available to help some more.

    Thanks! Just keep in mind I'm still making minor adjusts to the database schema along the way, as I learn more about the data I'm importing.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
Sign In or Register to comment.