MULTISEARCH

(from Your Spectrum 12, Mar.1985)



After a brief sojourn writing commercial software, we welcome

programming guru Simon Goodwin back to the pages of YS with his first

major utility since ZIP! Multisearch might be somewhat smaller than its

predecessor but, as a fully relocatable 'search and replace' utility in

just 225 bytes, it too is dedicated to the art of speeding up your Basic

programs. Don't limit yourself to any other utility - make more of

Multisearch!



How many times have you laboriously gone through a ZX Basic program,

replacing one item with another? Well, despair no more, Multisearch will

quickly and automatically find and replace almost any selected item.

This routine is easy to use and is only 225 bytes long. It'll run

anywhere in memory (so it doesn't interfere with other utilities) and,

what's more, turns out to have lots of useful and unexpected

applications.





POWERFUL POSSIBILITIES



The possibilities of Multisearch aren't limited to changing one message

for another. You can use it to edit long program lines, to replace

keywords or to document programs (replacing line number references with

names). Multisearch will also work the other way, replacing names with

numbers - which is very useful if you intend to compile a Basic program

into machine code.



Most interesting of all is the possibility of writing programs which

edit themselves; Multisearch can easily be called while a program runs.

In this article we will investigate the internal format of ZX Basic and

show how you can use Multisearch to make programs faster, more concise,

or to protect them against people who want to fiddle with them

(Troubleshootin' Pete, please note).





INSPIRATION



The idea of Multisearch came when YS reviewed a job lot of 'programmers'

toolkits' a number of months ago. These are designed to make life easier

for Basic programmers, but they all turn out to have a common flaw -

they won't let you replace numbers in a program automatically.



Some of the toolkits had a 'search and replace' facility, but they all

had annoying limitations - for example, Super Toolkit would only replace

single keywords. The suggested use was to change LPRINT into PRINT or

vice versa, but in fact that's pretty pointless because you can get the

same effect on any Spectrum with a standard (but undocumented) command:



OPEN #2,"p"



This sends the output of PRINT statements to the printer until you

cancel it with:



OPEN #2,"s"



If you want to work the other way, you can use:



OPEN #3,"s"



to send the results of every LPRINT statement to the screen. When you

want to use the printer again, the command:



OPEN #3,"p"



will set things back to normal.



It's a bit more useful to be able to replace text in a program - perhaps

you might want to Americanise the word 'colour' by replacing it with

'color', or enforce some similar indignity. But by far the most useful

application baffles every single toolkit - the problem of changing

numeric values within a program.





INSIDE BASIC



The accompanying figure shows the rather complicated way the Spectrum

stores a simple Basic program:



10 PRINT 2+VAL "2"

20 GO TO 10



Most of the data is ASCII code - for instance, 34 is the code of

inverted commas and 236 is the code of the keyword GO TO. A full list of

the keyword values is in Appendix A of the Spectrum manual - take a look

at the strange way the Spectrum stores numbers.



Most numbers in a program are also stored in a hidden 'binary form'

which takes up six extra bytes. This is meant to make programs run more

quickly, by removing the need for the computer to convert numbers from

text to binary whenever they are found. In practice, VAL "2323" can be

handled almost as fast as the number 2323, and the first version uses

three less bytes, because the string value doesn't have a hidden 'binary

form'.



In the figure, you can see that VAL "2" needs three less bytes than '2'

on its own. The number '2' is followed by a 'marker' byte (code 14)

which tells the LIST routine to skip the next five bytes - the binary

form of the number. When the program RUNs, the text is ignored and the

binary form is used.



The binary is in a rather odd format - one which is explained in Dr Ian

Logan's excellent book, Understanding Your Spectrum (published by

Melbourne House). Luckily, with the aid of Multisearch, you don't need

to understand the format to manipulate it.



The upshot is that numbers in ZX Basic programs need careful treatment,

as they can gobble up memory at an alarming rate. Some expressions for

numbers are even more concise than the 'VAL' version, because they use

the keyword PI instead of a number. PI only occupies one byte in a

program. The accompanying table lists a few common values and the

expressions to replace them, along with the number of bytes saved ('n'

represents any number).



You could use variables with preset values instead of numbers to get a

similar saving in space, but beware - ZX Basic is rather slow at finding

the value of variables; expressions like SGN PI may be worked out more

quickly, especially if your code uses lots of variables anyway.



Interestingly, values expressed using the BIN function are also stored

in two forms, so that BIN 1 soaks up eight bytes - one for the keyword,

one for the digit, and an extra six for the genuine binary form.



The line numbers at the start of each line are stored in a more sensible

'packed' format - each number occupying just two bytes. They are

converted into decimal by the LIST routine in the ROM. The two bytes

after each line number hold the length of the line, so that Basic can

skip quickly from one line to the next. An 'ENTER' character is at the

end of every line. This format is briefly explained in the Spectrum

manual, on page 166.



The first program given is a simple loader which will store the machine

code for Multisearch at address 30000. To use it, simply RUN the program

and if you've made no typing mistakes, the correct code will be stored.

If there's a mistake in the data, an appropriate message should appear.

It's wise to SAVE the program as soon as it has apparently run

correctly, just in case an error has slipped through. If you save the

code you can then load it again - without the Basic - at any address.





MULTISEARCH ON THE RUN



The routine is very easy to use, and all you need to do is load the code

into any free area of memory. It's 225 bytes long, so if you've already

got another machine code routine from address 53246 onwards, you might

CLEAR 53020 and load the code at 53021. Multisearch will work happily on

a 16K computer. If you're really pushed for space you could load it into

the printer buffer at 23296, so long as you don't use the printer until

you've finished with Multisearch.



Wherever it ends up, you call the routine by jumping to its start - with

RANDOMIZE USR 53021, for example. But before you do this you must tell

Multisearch the text you want to alter. You do this by setting the Basic

variables S$ and R$.



Logically enough, S$ should contain the text you want to search for, and

R$ should contain the replacement. This is the essence of the power of

Multisearch - the text can be program- generated, so you're not just

limited to what you can type in. You can enter keywords in strings by

typing THEN (Symbol Shift 'G'), followed by the keyword, and then

stepping back to scrub out the THEN before you press Enter.



If you load Multisearch into the printer buffer you could try it out

with this simple program:



10 LET S$="OLD TEXT"

20 LET R$="NEW TEXT"

30 RANDOMIZE USR 23296



When you RUN the code and LIST it you'll find that S$ and R$ now refer

to the same text. Of course, S$ and R$ don't have to be the same length.

The only restrictions are that both strings must be less than 256

characters long, and S$ mustn't be empty (!). In either case,

Multisearch detects the problem before it tries to alter anything, and

reports a 'Parameter error'. If S$ or R$ are not set, you'll receive a

'Variable not found' message and the program will be unchanged.



Multisearch is very fast, but it can take a few seconds to make major

changes to a long program. You can break into it while it's working by

pressing the Space key. The routine stops once it's made the current

change and spits out a 'Break into program' message. If the routine runs

out of room to make changes it'll do as much as it can and then report

'Out of memory'.



It's important to realise that Multisearch doesn't check the syntax of

lines as it alters them - this would make it slow and much less

versatile. However it means that you can thoroughly mess up a program

by, say, changing all the LET keywords into POKEs.



If you corrupt a program in this way you'll get a 'Nonsense in Basic'

error when you try to RUN it. Be careful if you change the keywords back

automatically - you could end up changing genuine POKEs into 'nonsense'

LETs. The moral of the story is to be careful before you use Multisearch

... if in doubt, SAVE your Basic before you mangle it.





TRICKY DIGITS



This business of using strings is all very well, but it doesn't help us

replace numbers in program lines. We can't store a number in a string

without putting it in quotes (or using STR$). LET A$="1" is OK, but LET

A$=1 gives an error, and we've already discovered that numbers outside

quotes have a special format. To illustrate this, try out the following

program:



10 LET S$="40"

20 LET R$="60"

30 RANDOMIZE USR 23296

40 PRINT "Hello";

50 GO TO 40

60 STOP



When you RUN this program it'll replace the text '40' in line 50 with

the text '60'. However, it won't replace the hidden binary form; the

program still prints out 'Hello' over and over again, because ZX Basic

uses the binary form of the line number (still 40), and ignores the text

completely. You end up with a line that reads GO TO 60 and performs a GO

TO 40!



This is a very useful trick to discourage people from editing your

programs - you can jumble up the text of the line numbers but the

program will still work correctly because the binary forms are

unchanged. The hidden binary is removed when a line is edited (to stop

it getting in the way as you move along the line) and the binary is

re-calculated from the text when you press Enter. This means that the

jumbled values are taken literally after a line is edited, changing the

way the program works and hence discouraging fiddlers.



You can save a little memory by replacing the text of each number by a

single digit. However you can't dispense with the text altogether -

there must be some numeric text between the GO TO and the CHR$ 14, or

Basic will spot the subterfuge and give the game away with a 'Nonsense

in Basic' error.





BINARY CHOICE



We still can't alter numbers properly. The routine so far will only

change text within a program ... it can't replace the binary form of

numbers. The solution is to distinguish between numbers and strings, and

use a small Basic program to work out the binary form of a number. An

appropriate routine is given, which should be MERGEd with your Basic

program once the Multisearch code is loaded.



Rather than use a complicated routine to generate binary forms, this

program 'cheats' by storing the required number in a variable and then

PEEKing the contents of the variable area (which always contains binary

values in the same form as that used within programs).



To use the program type GO TO 9990 and press 'T' or 'N' to indicate

whether you want to search for text or a number. Then type the data

required, exactly as it appears in the program. If you select 'N', the

program adds the numeric form to S$. Next you specify the replacement,

which may (once again) be text or a number. The program STOPs once the

requested changes have been made.



This technique is not ideal, but it does allow numbers to be changed

properly without denying you the ability to alter numeric text and leave

binary forms unchanged. If you need to process a pattern which contains

a number, you'll need to add other characters around the search or

replacement string, using the normal Spectrum string handling commands.



You can use the 'binary form' program as a subroutine if you replace the

STOP in line 9902 with a RETURN and get rid of the CLEAR statement in

line 9900. However you must make sure that V is the first variable

encountered when your program is RUN. The routine finds the binary form

of a number by storing it in variable V, and then PEEKing the first

entry in the variable table. If V isn't the first entry you'll get

incorrect results.





ASSEMBLER LISTING



Multisearch uses a number of interesting routines and could form the

basis of a complete Basic toolkit. The assembly code of the routine,

produced by the whizzo new Microdrive version of the Picturesque Editor

Assembler, is a little more repetitious than it need be, since it's

written in relocatable code. This means it'll run anywhere in memory

without modification, but also that it can't use any internal subroutine

calls, since the location of each subroutine is not fixed.



Broadly speaking, the program can be divided into two sections. The

first part (up to the label LINE) is used to find the variables S$ and

R$ and check that they contain correct values. The code to find S$ is

duplicated to locate R$ - the only difference is the letter of the name

and the extra check to make sure that S$ contains at least one

character.



At FINDS, the program points HL into the variable area and then looks

for a capital 'S'. This indicates the start of the storage allocated to

S$, as explained on page 168 of the Spectrum manual. The ROM routine

F_VAR is used to step from one entry to the next until the required

letter is found, or the end of the table is reached - in which case a

'Variable not found' error is generated.



Strings stored in the variable area are preceded by their length,

recorded in two bytes in normal Z80 fashion - low byte first.

Multisearch can't cope with strings of more than 255 bytes (the code is

kept simple! ) so it generates a 'Parameter error' if the most

significant byte of either string length is not zero. If all goes well

IX is left pointing to the text of S$.



From NEXT2 onwards the routine looks for R$. The address of the string

(a pointer to the length, in this case) is stored at R_LEN, at the end

of a Basic work area called MEMBOT. DE is pointed just before the start

of the Basic program (as if the Enter at the end of a previous line had

just been reached) and the main loop through the program begins at LINE.



At LINE the routine expects the end of a line and the start of a new

one. It skips over three bytes - the Enter and line number - and stores

a pointer to the line length in L_LEN. We need to know where the line

length is recorded since we may need to alter it if we add or delete

characters in the line.



FIND is the point at which Megasearch [sic] tries to locate the search

string. DE is saved, so that we know where the match did (or didn't)

occur, and then the loop at MATCH is used to see if the characters from

DE onwards match those from IX onwards. Register B contains the length

of S$. If the comparison fails before B reaches zero, the program leaps

off to GO_ON, but if all goes well, the length of R$ is fetched and

compared with that of S$. If the two are the same, execution continues

at NO_OK (pronounced 'number OK'!) - otherwise some characters must be

inserted or deleted so that the replacement text fits in the line.



The job of adding or removing characters is not trivial, since any

change in the program size also alters the location of variables, and

other useful pieces of information. Luckily, ROM routines exist to

adjust the program size and make sure that nothing gets lost. SHRNK and

XPAND remove or add BC characters at the location pointed to by HL.

XPAND produces an 'Out of memory' error if there's no room for the extra

characters. If S$ and R$ are different lengths then Multisearch must

adjust the line length (as explained earlier) and alter the pointers to

S$ and R$. Any movement of the program also sends the variables skidding

around memory, since they're stored at the end of the program. This took

a little while to puzzle out when we tested the machine code!



A couple of extra jumps are located between the Delete and Insert

instructions - the main loop is too long to be traversed in a single

relative jump (it can only cross 126 bytes at one mighty bound) so FINDX

and LINEX are used as 'staging posts' on the way to FIND and LINE

respectively.



Various paths meet at NO_OK. At this point a correct match has been

found and the address on the stack points to the place where R$ must be

stored. An LDIR is used to copy the new text into the program. This

leaves DE pointing to the character after the new data, from whence the

search can re-start. If S$ didn't match the program we have to advance

DE and start again one byte further through the program. This step is

performed at GO_ON.



Whether or not a match was found, we end up at NEXT, where the Break key

is polled in case the user has decided to give up. The routine stops

with a BREAK error if bit zero at port address 32766 (the Space key) is

reset. At CONT the contents of the system variable VARS are compared

with the address in DE. If DE is pointing into the variable area we've

finished, and the routine RETurns. Otherwise we must look further

through the program, although before that we check for a couple of

'special cases'. If DE points to an 'ENTER' character we've reached the

end of a line, so we should pick up the new line length by looping back

to LINE.



If DE points at a number marker - CHR$ 14 - we must skip over the binary

data since it could contain values which appear to be text or keywords,

but aren't really. This doesn't stop us finding numbers, since those

will always start with an ASCII character (probably a digit). If we've

reached the CHR$ 14 we've gone too far.





POSSIBLE IMPROVEMENTS



There are lots of ways in which Multisearch could be improved, but the

existing code works and it doesn't take long to type in! It might be

useful to make it return a count of the number of replacements found,

and perhaps a list of the lines in which changes were made. It would be

convenient (but perhaps rather difficult) to re-code the 'binary form'

program in machine code.



As it stands, Multisearch is a simple but very effective routine with a

multiplicity of uses. There can't be many short routines which can be

used to make ZX Basic edit-proof, faster, more concise, more readable,

and more versatile. Do let me know what you make of Multisearch.



