Database model ZXDB

1246730

Comments

  • It turns out the database contains both Alvaro Mandago Cruz:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^Alvaro+Mandago+Cruz$

    And also Alvaro Mangado Cruz:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^Alvaro+Mangado+Cruz$

    I suspect they are the same person, and last spelling is correct. Could someone please confirm it?

    You're asking for a world of pain going down this road, believe me, as I've done it myself. There are some inconsistencies that are undoubtedly entry errors, probably like the one you mention, but it didn't help that sometimes the authors themselves misspelled their names in the software or used different spellings.

    For example, from this recent thread we now know that D.A. Cooper and Darren Cooper are one and the same. Impossible to know without this kind of confirmation though.

    Personally I'd ignore this in the database for the time being and just load all the distinct authors as they are. When it goes open-source to the community get them involved instead and do the donkey work for you :-)
  • edited July 2016
    Vampyre wrote: »
    You're asking for a world of pain going down this road, believe me, as I've done it myself. There are some inconsistencies that are undoubtedly entry errors, probably like the one you mention, but it didn't help that sometimes the authors themselves misspelled their names in the software or used different spellings.

    For example, from this recent thread we now know that D.A. Cooper and Darren Cooper are one and the same. Impossible to know without this kind of confirmation though.

    Very well said. Having been collecting a lot of data for newer games, I've noticed the only way to keep good consistent data is to not change anything until you have real proof of. Release dates that are only written on paper but never really released, people getting their own name wrong, reviews that get facts wrong and as a result, people changing database because of wrongly written reviews... A "consistent" database full of data that one have made up and haven't been checked isn't a consistent database.

    I also applaud the idea of trying to fit all the data inta a new database, of course. But it's not much use other using this for the start of a new site. For data entry purposes, such a database is the worst kind of enemy. I suspect the reason why Martijn was able to update the "database" so quickly and accurately is because he went outside of the database to update (best practice here!), and I pity the person who would have to keep a new database up-to-date.
    Personally I'd ignore this in the database for the time being and just load all the distinct authors as they are. When it goes open-source to the community get them involved instead and do the donkey work for you :-)

    Well, it's not really "donkey work", more like the opposite. Fact-checking is very serious work. If anyone claims anything, you still won't be sure if it's true or not.
    Post edited by Timmy on
  • edited July 2016
    I do agree with that you said about consistency - change nothing without proof. The magazines are an absolute hive of getting things wrong, particularly with some of the reviews some of which the descriptions are blatantly incorrect.

    I remember a discussion with Martijn years ago about some of the older Ultimate games. He had no authors for some of them because there was no concrete proof anywhere that the Stamper brothers had actually written them. I assume he relented eventually, or maybe the Stamper brothers confirmed it, as they're now flagged as the Stampers authoring them.
    Post edited by Vampyre on
  • Zup wrote: »
    Yesterday I was looking for a game on infoseek that had known errors. Links to broken errors are... broken, so I was wondering if I could extract this data using the database available and some custom views. Is it possible?

    Yes, I plan to store "known errors" information in this database too.
    Zup wrote: »
    Also, what does contain that infoseek database? The tipshop data is stored as links, or is stored on a table on that database? Are info from other sites stored there?

    Information from other sites as stored as links. It wouldn't make sense to copy content from other sites into this database.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016
    Attribution is always a problem.
    Found that out when uploading some loading screens to ZX-Art.
    On that site the solution is simple, just comment that the author might be X or Y.
    Then other users will confirm the info if that's the case and an admin makes the correction when he has the time.

    For me main feature of WOS is really the "database" and updates.
    I think that the forum, although its nice, has generated noise and derailed the entire project a bit.
    Completely agree that removing bot accounts takes time that would be better spent on doing actual updates.
    Just my 2cents. :-)
    Post edited by 4throck on
  • edited July 2016
    Zup wrote: »
    Yesterday I was looking for a game on infoseek that had known errors. Links to broken errors are... broken, so I was wondering if I could extract this data using the database available and some custom views. Is it possible?

    Hello Zup,

    Please find below a link to a csv of all the games with known issues. If you need any other fields, or anyone else wants any exports just let me know.

    https://www.dropbox.com/s/ilug7tk6589wm4a/GamesWithKnownErrors.csv?dl=0

    For those who care, this is the SQL:
    SELECT        Infoseek_Source.id, Infoseek_Source.fulltitle, Infoseek_Source.yearofrelease, Infoseek_Source_Publisher.name, Infoseek_Source.language, Infoseek_Source.machinetype, Infoseek_Source.spotcomments, 
                             Infoseek_Source.type
    FROM            Infoseek_Source INNER JOIN
                             Infoseek_Source_Additional ON Infoseek_Source.id = Infoseek_Source_Additional.id INNER JOIN
                             Infoseek_Source_Publisher ON Infoseek_Source.id = Infoseek_Source_Publisher.id
    WHERE        (Infoseek_Source_Additional.info LIKE '%errors')
    ORDER BY Infoseek_Source.fulltitle
    

    Regarding updates. I think its fine, as long as you have a good audit log, and a limited number of people making changes. It sounds like Einar has this in place. @Timmy can you share the Google Forms for your site so we can start creating the new records in our database?

    I have been waiting for years for this, and I'm very excited about the work going on.

    Post edited by jonesypeter on
  • Vampyre wrote: »
    Personally I'd ignore this in the database for the time being and just load all the distinct authors as they are.

    I don't have much choice. Not fixing mismatches would give me invalid foreign keys. And I certainly don't want to sacrifice database constraints.

    For instance "Waner Brothers" doesn't exist:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^D.C.+Comics+Inc$

    A "Minesweeper Bytemaniacos" clone is not a real thing:

    http://www.worldofspectrum.org/infoseekid.cgi?id=0014853

    "Kersherweb" and "Kesherweb" are 2 different words:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^Kersherweb$

    And so on...
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • Please find below a link to a csv of all the games with known issues.

    However this will only list which games have known errors. It won't give any information about these errors.
    Regarding updates. I think its fine, as long as you have a good audit log, and a limited number of people making changes. It sounds like Einar has this in place.

    Sorry but I don't. My goal is to convert current archive into a fully consistent database. This is an essential step to build a new WoS to handle future updates, but someone else will need to take over the following steps.

    I cannot afford to dedicate more time than I have committed already.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016
    Hi Einar, sorry I meant to say that I imagine you are making notes of the changes you are making to the original infoseek data, and not that you would continue to update it. That's up to the new site owner. Can I request that if ZXDB is used that you make it a condition that the new WoS site admin realease zxdb at each update? Then we are all happy and can ensure we don't go back to a closed system.

    I agree the report I produced only shows which games have errors using the Spot comments, and not whats wrong with them. This does not work on WoS currently so hopefully this will be useful.

    Keep up the good work.
    Post edited by jonesypeter on

  • Regarding updates. I think its fine, as long as you have a good audit log, and a limited number of people making changes. It sounds like Einar has this in place. @Timmy can you share the Google Forms for your site so we can start creating the new records in our database?

    I have been waiting for years for this, and I'm very excited about the work going on.

    Request unable to comply. You must have confused me for someone else.

    (PS. In the future please do not address me somewhere in the middle of a post, I was very lucky to have spotted this, but I don't actually read this thread.)
  • edited July 2016
    @timmy sorry I thought you did the Google Site with the new games, but I must be mistaken

    https://sites.google.com/site/speccy21/home
    Post edited by jonesypeter on
  • Hi Einar, sorry I meant to say that I imagine you are making notes of the changes you are making to the original infoseek data, and not that you would continue to update it.

    Correct!
    Can I request that if ZXDB is used that you make it a condition that the new WoS site admin realease zxdb at each update? Then we are all happy and can ensure we don't go back to a closed system.

    From my point of view, this data "belongs" to WoS and ultimately Martijn. Therefore I don't think I have the right to impose any conditions upon WoS on how to use its own data, regardless of how much work I'm dedicating here to contribute.

    Despite this, it's clear both Martijn and Lee intended to keep data fully accessible through APIs. Decisions regarding the future of WoS lies on their shoulders and I'm sure they will keep it into account.

    The only condition I think is reasonable for me to impose is, if anyone besides WoS decides to run a copy of this database (as backup, redundancy or whatever), then any corrections and changes should be made accessible so this information can benefit WoS also.
    I agree the report I produced only shows which games have errors using the Spot comments, and not whats wrong with them. This does not work on WoS currently so hopefully this will be useful.

    Yes, it will be useful for me to validate ZXDB, thanks!
    Keep up the good work.

    Thank you!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016
    @timmy sorry I thought you did the Google Site with the new games, but I must be mistaken

    https://sites.google.com/site/speccy21/home

    No problem. And I am not the owner of that site.
    Post edited by Timmy on
  • @timmy sorry I thought you did the Google Site with the new games, but I must be mistaken

    https://sites.google.com/site/speccy21/home

    This site belongs to Hikoki. He already contacted me and kindly offered to provide this data.

    It will be trivial to import information from his CSV, but I need to finish importing ZXDB first before accepting any updates.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • I would just like to say thank you to everyone that is contributing or helping in this thread. Hopefully this will help to bring a better World of Spectrum web site to the Internet :)

    Mark
  • @joneseypeter @timmy et all I had a version with googleforms to update the csv so we could adapt a new one fitting the zxdb model? I can make you access to the speadsheet as collaborators too even pass you the entire control of the site since I use dropbox which will stop winxp support on August :)
  • @Hikoki Thanks very much. Access to the sheet would be great.

    Have you thought of moving to linux? The like of Mint and Ubuntu support Dropbox.

    Peter
  • edited July 2016
    @Hikoki Thanks very much. Access to the sheet would be great.

    I will send you a PM with my Google linked email.

    Have you thought of moving to linux? The likes of Mint and Ubuntu support Dropbox.

    Peter
    Post edited by jonesypeter on
  • Vampyre wrote: »
    Personally I'd ignore this in the database for the time being and just load all the distinct authors as they are. When it goes open-source to the community get them involved instead and do the donkey work for you :-)

    Indeed. It's way too easy to spend enormous amounts of time over-normalising data and never really getting any benefit from it and putting too much of an assumption that everything attributed to, for example, J Smith is the same person is an exercise in futility.
  • AndyC wrote: »
    Vampyre wrote: »
    Personally I'd ignore this in the database for the time being and just load all the distinct authors as they are. When it goes open-source to the community get them involved instead and do the donkey work for you :-)

    Indeed. It's way too easy to spend enormous amounts of time over-normalising data and never really getting any benefit from it and putting too much of an assumption that everything attributed to, for example, J Smith is the same person is an exercise in futility.

    No, I'm not "over-normalising data" and my fixes are not an "exercise in futility"! :)
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • AndyC wrote: »
    Vampyre wrote: »
    Personally I'd ignore this in the database for the time being and just load all the distinct authors as they are. When it goes open-source to the community get them involved instead and do the donkey work for you :-)

    Indeed. It's way too easy to spend enormous amounts of time over-normalising data and never really getting any benefit from it and putting too much of an assumption that everything attributed to, for example, J Smith is the same person is an exercise in futility.

    No, I'm not "over-normalising data" and my fixes are not an "exercise in futility"! :)
    So are we also going to correct all the issues like Spectrasoft not actually being the same company in every instance?

    At some point there are absolutely going to be errors in the data. From the point of getting a working system, it's probably safer to treat every distinct entity as if they were definitely different and every entity that's "the same" with a healthy degree of suspicion unless there is some solid confidence in stating they are the same. In my experience keeping the two very separate at the database level (even if subsequently unified by exposing views) is a lot easier to incrementally improve upon than just mixing everything together and constantly "correcting" it.
  • edited July 2016
    AndyC wrote: »
    So are we also going to correct all the issues like Spectrasoft not actually being the same company in every instance?

    No, I'm only fixing inconsistencies.

    If the information is consistently incorrect, I'm not touching it, because in such cases it's nearly impossible to be 100% sure about the correct way to fix it.
    Post edited by Einar Saukas on
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • hikoki wrote: »
    @joneseypeter @timmy et all I had a version with googleforms to update the csv so we could adapt a new one fitting the zxdb model? I can make you access to the speadsheet as collaborators too even pass you the entire control of the site since I use dropbox which will stop winxp support on August :)

    Well, I don't mind updating your site for now, if you need help. For me, it has a lot of overlap with updating the Games List or the tweets I'm making. You'd need to help me with those google and dropbox stuff though, as I never used google and dropbox before.
  • It turns out the database contains both Alvaro Mandago Cruz:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^Alvaro+Mandago+Cruz$

    And also Alvaro Mangado Cruz:

    http://www.worldofspectrum.org/infoseekpub.cgi?regexp=^Alvaro+Mangado+Cruz$

    I suspect they are the same person, and last spelling is correct. Could someone please confirm it?

    Yes, I think you can merge those 2, Einar ;)

    BTW I am happy to see that someone starts correcting the database, something that I did as a hobby for a few years and enjoyed a lot, and being the DB so huge, there were some errors which I happily let Martijn know :)
    Keep it up, Einar and thanks a lot for your effort ;)

  • Ivanzx wrote: »
    BTW I am happy to see that someone starts correcting the database, something that I did as a hobby for a few years and enjoyed a lot, and being the DB so huge, there were some errors which I happily let Martijn know :)
    Keep it up, Einar and thanks a lot for your effort ;)

    Thanks!

    We will really need a joint effort later to process years of accumulated correction reports from the "Infoseek database" category of this forum!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • @Einar Saukas I'm interested in this topic too. Some time ago I wanted to do (almost) the same thing as you 're doing (though using a completely different approach[1]) but back then the wos db was not public (http://www.worldofspectrum.org/forums/discussion/52464/is-infoseek-database-available).
    I managed to import most of the wos db using http://www.worldofspectrum.org/rainbow-wos.zip but I don't have all the metadata (e.g. wosid, description, covers, maps, pokes, etc.). If you managed to get that info, it will be great if you can share your db.

    Cheers,
    BogDan.

    P.S. If you still want to go with SQL dbs I recommend you to use SQLite3 instead, this way everybody can download it and use it, without requiring any (free/proprietary) server.

    [1] Instead to use an SQL DB I'm using a new format based on flatbuffers, here https://github.com/bog-dan-ro/spectacol/issues/34 you can find more info.
  • bog_dan_ro wrote: »
    but back then the wos db was not public

    Yes, I didn't have access to it either, when I defined the database schema. I only obtained access later, when I started to import data.

    bog_dan_ro wrote: »
    If you managed to get that info, it will be great if you can share your db.

    I'm doing it already, see the link in my previous posts :)

    bog_dan_ro wrote: »
    P.S. If you still want to go with SQL dbs I recommend you to use SQLite3 instead, this way everybody can download it and use it, without requiring any (free/proprietary) server.

    Sure, see my comment about this topic in my original post!

    bog_dan_ro wrote: »
    [1] Instead to use an SQL DB I'm using a new format based on flatbuffers, here https://github.com/bog-dan-ro/spectacol/issues/34 you can find more info.

    I'm afraid you are not using the right tool for the job :(

    FlatBuffers is a serialization format. Therefore it's a great way to transfer data or temporarily store a full data snapshot.

    However FlatBuffers is not a data repository. For instance, modifying a single record would require storing a new copy of the entire "database" (several megabytes of data). And even if are willing to do it this way, you wouldn't have transactions or concurrency, thus it would be impractical to have 2 users updating information on completely different parts of the "database" at the same time.
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2016
    @timmy the goal of the site was not overlap wos but make a site for my enjoyment and others with far less info than provided by infoseek so I might share with you all a friendlier version later, with googleforms to update the spreadsheet and more fields as per the zxdb model. Surely I would need help to make such new spreadsheet..
    Post edited by hikoki on
  • Whilst checking data credibility I think I found the longest game title on WoS!

    http://www.worldofspectrum.org/infoseekid.cgi?id=0027953
  • I'm doing it already, see the link in my previous posts :)
    I found something here https://www.dropbox.com/sh/bgtoq6tdwropzzr/AAAuMt4OlA_RicOBgwQLopoMa/ZXDB?dl=0
    bog_dan_ro wrote: »
    [1] Instead to use an SQL DB I'm using a new format based on flatbuffers, here https://github.com/bog-dan-ro/spectacol/issues/34 you can find more info.

    I'm afraid you are not using the right tool for the job :(

    FlatBuffers is a serialization format. Therefore it's a great way to transfer data or temporarily store a full data snapshot.

    However FlatBuffers is not a data repository. For instance, modifying a single record would require storing a new copy of the entire "database" (several megabytes of data). And even if are willing to do it this way, you wouldn't have transactions or concurrency, thus it would be impractical to have 2 users updating information on completely different parts of the "database" at the same time.

    First and foremost I choose FlatBuffers because it's a super fast and flexible serialization format, you can add any fields to tables and you still have backwards and forwards compatibility. I need it to put in a single file the metadata for any ZX files but also the files them self. The format also needs to be easily used by any emulators that want to use it.

    But FlatBuffers is much more than a serialization format :). You can easily create a DB using it, of course when you need to update the DB you'll need to rewrite it. But this is not a problem for a ZX database because there are not too many changes these days and because all ZX apps that I managed to gather (from various sources not only from wos) are less than 1Gb which can easily fit into any modern phone RAM. Because they are in memory, searching is also very fast (less than 3ms/query on my laptop searching trough ~10k of records).
Sign In or Register to comment.