Import text files of type ins

edited November 2010 in MIA/STP/SDP
There are ways to read the text of scans and covert them to word or txt files. So you could do this then edit and correct them and import them to run and save. What do you think?
Post edited by MuZiKiLL on

Comments

  • edited November 2010
    You could, yes. I've never tried working from a scan, and your average OCR package probably isn't optimised for recognising Basic keywords as opposed to plain English text, so you may find that making the required corrections is as much work as typing it in by hand.

    But once you've got the listing in a text file, there are several tools that can convert that to a loadable emulator file: BAS2TAP, and my own preference zmakebas (which Gzavsnap has ported to DOS/Windows). I expect BASin can do the job with a quick cut-and-paste too.
  • edited November 2010
    Ah, yes, the Holy Grail of the type-inner - that there's some magical way of taking all the effort out of typing in program listings from books and magazines by scanning OCR the pages and importing the resultant text files. Sorry, but the hard way is the only way that's worth pursuing. I've got 1635 type-ins on file at TTFn, of which I've prepared about 1100 of them, and if OCR actually saved any significant amount of time in transcribing program listings I'd have used it long ago, but it doesn't and I didn't. It saves on typing time but loses on checking time and so is no great help.

    It's just not accurate enough; even a 99% accuracy would mean that every line would still have to be checked for errors, and the mediocre to poor print quality of many listings, the fact that they're usually printed on matrix printers which type-faces don't OCR particularly well, that OCR is generally intended for interpreting human rather than computer languages and so often makes wrong assumptions about what questionable text should be, the readily misinterpreted 0/D/O - 1/I/l - 2/Z - 5/S - 8/B - all conspire to reduce the accuracy of the output. Then, whereas text in a human language can have loads of grammatical, punctuation and spelling errors in it and still be understood, text in a computer language has to be 100% correct before the program will run correctly. As for OCRing hex dumps - that way lies madness.
  • edited November 2010
    Battle bunny thats one heck of an impressive type in website!
  • edited November 2010
    Crisis wrote: »
    Hey BattleBunny
    is this maybe one off the 1635
    http://www.worldofspectrum.org/forums/showthread.php?t=24221&highlight=25000
    Martijn told its definitely a type-inn

    I don't recognise it, and I've checked the monitor programs which I have listed or on file and none of them look like that one. I also searched the code in case an author's name was in there but I couldn't find one. So I can't identify it, except to say that it did not appear as a type-in in any of the 1200-odd magazine issues indexed on TTFn. If it was a type-in, then it might have appeared in Personal Computer News or Popular Computing Weekly - a lot of those issues aren't indexed; or perhaps in a book - not my area.
  • edited November 2010
    Well, I tried it, but the JPG resolution I got was not optimal. Loads of errors.

    It was Dr Ian Logan's DRAW An Arc routine, explained from ROM to BASIC. Ended up typing it in by myself (kinda short).

    Used BASin to get a .BAS and then a .TAP.
    The file sequence was: .JPG -> .TXT -> (BASin/Paste Code) -> .BAS -> .TAP :grin:

    I think that with a better scan resolution (mag code font was smaaaaallll in those days!) and the OCR's spell check turned OFF, you could get somewhere. I still think there WILL be errors even doing so (heck, that's why code from mags never worked as one expected first try :evil: ).

    Regards,
    Marcelo.
  • edited November 2010
    Maybe different scanning utilities give different results. What ones have you used and what ones give the best results?
  • edited November 2010
    In the past I've tried various versions of TextBridge and OmniPage; more recently MS Office Document Imaging. I do re-visit the problem occasionally, but I always eventually get fed up and go back to typing them in, and just use OCR for the text of accompanying articles and instructions.
  • fogfog
    edited November 2010
    don't use Jpeg .. set it to black/white NOT grayscale.. I find upping the dpi to 1200 fixes trickier ones.

    I use "top ocr" results are hit / miss
  • edited November 2010
    gasman wrote: »
    But once you've got the listing in a text file, there are several tools that can convert that to a loadable emulator file: BAS2TAP, and my own preference zmakebas (which Gzavsnap has ported to DOS/Windows). I expect BASin can do the job with a quick cut-and-paste too.

    Just to add <plug>that there's also an opensource linux/unix tool called bast, written by me, which also does this (and more!)</plug>. Actually bast can also be used on hex listings (with a XOR checksum of each row of 8 bytes. Hopefully I'll make the format more flexible in a future version). Though, as mentioned elsewhere, trying to OCR a hex listing is crazy.
  • fogfog
    edited November 2010
    AY Chip wrote: »
    trying to OCR a hex listing is crazy.

    depends on the quality of the original listing AND if there is checksums in the listing I guess.

    using the 128's serial port to pc com1 etc.. I guess is a way.. and adjusting any ascii accordingly.

    used to do same on c64, with 3.5 disks.. but you added or minused -$20 to the letters.. I forget , long time ago.
Sign In or Register to comment.