Database model ZXDB

1356730

Comments

  • RMartins wrote: »
    Off Topic: How do we make a reference to a user that has spaces in his name ?

    What do you mean?
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • AndyC wrote: »
    Had a quick peak at the schema. Few things I'd definitely suggest:

    Thanks for the suggestions! See my answers below:

    AndyC wrote: »
    Avoid calling everything ID. It's easy, but down the line rapidly becomes difficult to work with. Even just going with a convention like TableNameID is much less error prone.

    Sorry I disagree. I don't think "TableNameID" is any better than "TableName.id".

    AndyC wrote: »
    Avoid "magic" numbers, such as the players value in the programs table. Instead make a lookup table which details what the values 0,1,2,3 etc mean and a FK constraint to enforce consistency. Even if you don't technically "need" the table for things to work, it's much more self-documenting and makes it much easier to produce a meaningful data export down the line (avoiding the situation where someone assume 3 means 3 players, for example)

    Actually it seems WoS internally stores the actual number of players. If so, it makes even more sense to just have a numeric value there. I still need to confirm this.

    AndyC wrote: »
    Think long and hard about whether you really want Nullable VarChar columns. Unless there really is a meaningful distinction between a NULL value and an empty string, you want them NOT NULL. There is almost no situation with a meaningful distinction. There are a few other cases were FK columns are nullable, but it might not be the ideal representation - for example the author => roletype relationship is nullable, presumably to mean "don't know", but querying data can get cumbersome with that and you may be better off with a roletype of "Unspecified" and an enforced relationship.

    Both options are valid. It's a matter of preference.

    AndyC wrote: »
    Generally I'd recommend camelCase or PascalCase names rather than using underscores, particularly as the don't seem to be used consistently.

    This is not an option. When I'm finished, I will export a "clean" database definition. When I do it, all upper/lower cases will be lost.

    Anyway I will review name consistency.

    AndyC wrote: »
    And, whilst it's not a massive deal since they're pretty much a one off thing, I'd have avoided scripting out the data population scripts as quite so many INSERT VALUES scripts because all those sub-selects are cumbersome, slow and a PITA to debug. An INSERT SELECT with appropriate joins is so much easier (especially when debugging as you can run the SELECT part independently, which you can't with a VALUES clause)

    Again, I will export a "clean" database definition when I'm done. So the way I'm writing these queries here is irrelevant, they will all disappear in final version.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Dave_C wrote: »
    Can I ask a question? A while ago some loudmouths very vocal individuals on Facebook were claiming that a WOS database and site could be built in a matter of days simply by scraping search content. Were they talking complete boollocks nonsense?

    If WoS "database" was fully consistent, just importing it to the new database schema I defined would take only a few days. Unfortunately it's not. I'm spending a lot more time stopping to analyze each inconsistency and making sure each entry is correctly fixed and imported, than actually importing everything.

    Of course I could simply ignore inconsistencies to produce an incomplete database, or modify the database schema to allow inconsistencies. I would have already finished my work by now, and most people wouldn't notice the difference. But I won't.

    Scraping search content wouldn't be any faster than importing from existing files, and it would have the same issues regarding inconsistencies. Also parts of the WoS "database" are not even associated with programs, authors or publishers, so just looking at search contents directly would miss a lot of data.

    In a nutshell, building an incomplete "quick-and-dirty" new WoS database could be done considerably faster. A proper WoS database, not really.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
    Thanked by 1Dave_C
  • Does anybody know what's a ZX-Spectrum 4?

    http://www.worldofspectrum.org/infoseekid.cgi?id=0025981
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Could someone please help me, by testing both these entries on all Speccy models (using a reliable emulator) to see which ones work?

    Some programs may be very small but use shadow screen. Or only work on 128K in USR0 mode. And so on.

    Thanks!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Could someone please help me, by testing both these entries on all Speccy models (using a reliable emulator) to see which ones work?

    There's no need anymore, Peter Jones already figured it out. Thanks!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Just a quick status report:

    I have uploaded roughly half the WoS database so far. The latest results are available here. Again, thanks a lot to Lee Fogarty, Peter Jones, especially Gerard Sweeney, and even Martijn for their support!!!

    Also many thanks to Chris Bourne, who imported WoS to SQL Server directly from the JSON feed and kindly allowed me to use his material. As soon as I finish importing the entire WoS database, I will compare my results against his work for validation.

    For the record, I'm converting the entire WoS database the "hard way", from the original internal WoS files. The reason is, I want to preserve everything, not just the "visible" parts of WoS. This way, I can have access to extra information such as the distinction if a certain game was released without an inlay, or if the inlay is just missing... Also I'm coding several cross-validation routines along the way, that are allowing me to detect database inconsistencies, identify "lost" information like non-indexed games and publishers, etc. It's taking much longer this way but I think it's worth it! :)
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • You've done a really good job there Einar, and at least you've proved that it's not just an afternoon's work just to fix the database as so many of our critics believe.
    I wanna tell you a story 'bout a woman I know...
  • karingal wrote: »
    You've done a really good job there Einar

    Thanks!
    karingal wrote: »
    and at least you've proved that it's not just an afternoon's work just to fix the database as so many of our critics believe.

    Only if they meant converting only the "visible" parts of data from JSON (or HTML). Chris was able to do it in 3h :)

    As I mentioned before: checking, analyzing and fixing internal archive inconsistencies is what really takes time. I'm committed to convert 100% of WoS into a consistent database without discarding anything.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Also many thanks to Chris Bourne, who imported WoS to SQL Server directly from the JSON feed and kindly allowed me to use his material. As soon as I finish importing the entire WoS database, I will compare my results against his work for validation.

    You're very welcome. This is such a worthwhile project I wanted to do my little bit to help. It's nice to get a few people involved and foster that community spirit.
    Thanked by 21024MAK Dave_C
  • karingal wrote: »
    and at least you've proved that it's not just an afternoon's work just to fix the database as so many of our critics believe.

    Only if they meant converting only the "visible" parts of data from JSON (or HTML). Chris was able to do it in 3h :)
    No, people were saying they could have WoS up and running at 100% with just a couple of hours work.

    Martijn was spending 50+ hours a week updating and maintaining WoS, with the work you're doing and all the back end stuff that Lee is doing that will be reduced dramatically.

    I wanna tell you a story 'bout a woman I know...
  • edited July 2016
    karingal wrote: »
    karingal wrote: »
    and at least you've proved that it's not just an afternoon's work just to fix the database as so many of our critics believe.

    Only if they meant converting only the "visible" parts of data from JSON (or HTML). Chris was able to do it in 3h :)
    No, people were saying they could have WoS up and running at 100% with just a couple of hours work.

    Martijn was spending 50+ hours a week updating and maintaining WoS, with the work you're doing and all the back end stuff that Lee is doing that will be reduced dramatically.

    It was about 3 hours to develop the T-SQL code that parsed the JSON to XML in SQL Server and subsequently the XML into tables and rows. SQL 2016 would have been quicker as that natively supports JSON but oh well... The SQL coding and database creation took about 3 hours, the actual processing was seconds. Grabbing the JSON itself from the REST API into SQL was done overnight from a CLR function I developed recently. Overnight so as not to spam the REST server.

    As a full time ASP.NET / SQL developer I would estimate at **least** 4/5 months work, full time, to get something approaching the current WoS. Getting something displayed from a db is straightforward in ASP.NET/PHP/etc, but I understand that WoS uses CGI scripts due to disparate data sources - I may have that wrong though. ZXDB is a great idea for bringing all those datasources together and something I really want to see happen.

    Getting something fully administratable, without cheating by going directly into the database, is considerably more work. Particularly when it's going to be administered by mods who aren't necessarily familiar with the db structure and things potentially have to go into confirmation areas. ORM's are all well and good but there are overheads to take into consideration with them so good old T-SQL would be my favoured route for data access. That's just some of the development side: server infrastructure, backups, ddos considerations, file uploads, file storage (that must be tens of gigs on it's own), checking, bandwidth monitoring, etc add much more to the workload and I'll bet that's the tip of the iceberg. Whomever takes over from Lee is a braver man than I. I am, however, happy to help should it be needed. Please anyone, feel free to ask.

    2 hours, 2 weeks or months? Laughable.

    Oh, if anyone else wants the SQL Server database they are more than welcome to pick it up from: https://dl.dropboxusercontent.com/u/24607151/ZXDB SQL Server 2008.zip. It's a SQL 2008 compatible backup from a SQL 2014 engine. All tables are indexed on the [id] column which is the Infoseek Id. All I've done is some jiggery-pokery to get it from a publicly available JSON service into a RDBMS database so I'm happy to make it free for anyone to download. The entirety of the JSON extracts can also be grabbed from https://dl.dropboxusercontent.com/u/24607151/Infoseek JSON Output 20170719.zip. They've been bunged into one huge text file, each call (i.e. each Infoseek entry) separated by a few carriage returns.
    Post edited by Vampyre on
  • Can someone tell NickH that please?

    Obviously I can't as he's blocked me on pretty much all social media. Though having said that we know full well he still reads the forums without logging in (just as you can do on Twitter funnily enough...).
    I wanna tell you a story 'bout a woman I know...
  • So is there any chance that this work won't be just an experiment and will be used by somebody who will build the new WOS ?
  • Ralf wrote: »
    So is there any chance that this work won't be just an experiment and will be used by somebody who will build the new WOS ?

    It should work and be usable but to be totally honest that is a question only Lee can answer.
    I wanna tell you a story 'bout a woman I know...
  • edited July 2016
    I'm really excited about this project. It really shows what the community can achieve. It even sounds like Martijn approves.

    I see it as ideal for those who want to setup their local WoS site. It's great that all this data is freely available to us all, and we are not reliant on other people. I have the whole archive on an external drive like I'm sure many have from the days you could download from the FTP server each day.

    We can share data queries once zxdb is completed and php code samples. I can't wait.

    Could anyone with experience of using the MySQL Workbench migration tool to migrate a Microsoft SQL database please PM me? I'm getting errors.

    Thanks
    Post edited by jonesypeter on
  • I was going to make all the data available anyway - on the dev site you can download a CSV of all the publishers - the intention was to have the same for all areas. There's also the API's so anyone familiar with JSON could run their own version of WoS, but the data is central to WoS so updates are automatic... By using the API, you can then get the files from WoS or other archives if they are quicker.
    My test signature
  • When Lee announced he was leaving, I started ZXDB to ensure all WoS data would be preserved even if the site died. This database, together with an ftp mirror of WoS files, will give everyone access to the same content of WoS.

    However this initiative doesn't eliminate the need for a new WoS site. Most Speccy fans don't know how to run their own databases, so an easily accessible archive is essential to keep the community alive. Also we are still going to need a central reference for future updates (many people updating their own databases independently would quickly become impossible to reconciliate).
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Ralf wrote: »
    So is there any chance that this work won't be just an experiment and will be used by somebody who will build the new WOS ?

    Using ZXDB is really the only viable choice for whoever builds the new WoS. Unless the person prefers to repeat exactly the same work I am (actually we are) doing here!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • I agree with Einar about future updates need to be done centrally. I'm hoping that whoever takes over from Lee with publish regular database update so anyone in tge community can update their copy of zxdb without being forced down the API route. New Infoseek IDs could be agreed by something as simple as a Google Sheet readable by all and editable by a small group of maintainers chosen by the community.

    The CSV of everything never materialised so ZXDB give the answer many of us wanted.

    Lee has done a huge amount of worj on the security side of the site, and hopefully sone of this can be passed on without damaging Lees commercial interests.

    When ZXDB is complete I intend publishing a series of articles on using Wamp or xamp so anyone with reasnable IT skills can set something up for their own personal use.

  • edited July 2016
    I agree with Einar about future updates need to be done centrally. I'm hoping that whoever takes over from Lee will publish regular database update so anyone in the community can update their copy of zxdb without being forced down the API route. New Infoseek IDs could be agreed by something as simple as a Google Sheet readable by all and editable by a small group of maintainers chosen by the community.

    The CSV promised of everything never materialised so ZXDB gives the answer many of us wanted.

    Lee has done a huge amount of work on the security side of the site, and hopefully some of this can be passed on without damaging Lees commercial interests as previously mentioned.

    When ZXDB is complete I intend publishing a series of articles on using Wamp or xamp so anyone with reasnable IT skills can set something up for their own personal use.

    Post edited by jonesypeter on
  • Playing around with Vampyre's infoseek extract, there is certainly a possibility of building a solid database from it. Normalising/Denormalising data in TSQL is pretty much my bread and butter, and it really wouldn't be that difficult to move what's there into something more relational and robust. Obviously you'd lose anything not contained within infoseek though, so a database built using the real source files would be potentially much better.
  • edited July 2016
    I have taken the Microsoft SQL database created by Chris which is based on the infoseek JSON output (Thanks again to Chris for his work), and converted it to MySQL.

    Its a huge script, but if anyone wants a copy its available on my DropBox account. Just remember to create the database first. I have it running on my Raspberry Pi 3 (image below) without problems but I had to increase the size of max_filesize, and the max processing time.

    This does not replace the amazing work which Einar is doing (he is cleaning the data), this is just raw straight from the JSON data (so should match infoseek as it currently is)

    If anyone wants any assistance then please do ask. I'm very happy to help (as long as the results are shared with the community). I hope you find it useful.

    For anyone interested I used 'Database Converter for MySQL from SQLMaestro' You can download a 30 day free trial, but you have to click a button for every 100 records transferred so my click finger is worn out'!

    https://www.dropbox.com/s/chex9giz37u8xe3/metadata.zip?dl=0

    Peter

    https://s32.postimg.org/rv2i7qrn9/database.png

    Post edited by jonesypeter on
  • Great work Peter, thank you!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • It turns out the database contains both Alvaro Mandago Cruz:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^Alvaro+Mandago+Cruz$

    And also Alvaro Mangado Cruz:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^Alvaro+Mangado+Cruz$

    I suspect they are the same person, and last spelling is correct. Could someone please confirm it?
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • I just spent a few hours fixing several WoS archive inconsistencies. If anyone's interested to check current progress, I just uploaded the latest version.

    The database is now roughly 60% converted.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
    Thanked by 11024MAK
  • I have absolutely no idea what is going on.....BUT IT SOUNDS BRILLIANT. :D
  • Keep them coming :)
  • Yesterday I was looking for a game on infoseek that had known errors. Links to broken errors are... broken, so I was wondering if I could extract this data using the database available and some custom views. Is it possible?

    Also, what does contain that infoseek database? The tipshop data is stored as links, or is stored on a table on that database? Are info from other sites stored there?

    Thanks.
    I was there, too
    An' you know what they said?
    Well, some of it was true!
Sign In or Register to comment.