Issue 6 repair with a twist...
(Starting with a slight grump - the forum ate my first attempt at this so this'll probably end up being a bit terse :))
I got a Issue 6 48K machine in a couple of weeks ago, which was sold as faulty due to coloured blocks appearing on the screen when running:

I suspected faulty upper RAM trashing the stack, and this was confirmed by running my diags board which pinpointed a dodgy IC19.
This was replaced with a fresh 4164, however the coloured blocks remained, and worse still the diags were reporting both lower and upper RAM were healthy.
At this point I started wondering whether the ULA was flaky, or if there was an issue with the address or data buses, until I realised that the coloured blocks were actually always cyan when the paper was white.
Now suspecting a lower RAM issue, I entered and ran a simple BASIC program to fill the screen with some text, and indeed this confirmed that if there was corruption, it always affected bit 4 of the byte, corresponding to IC10:

In this case certain pieces of data handled by IC10 were returning 0 regardless of what was written, resulting in random blank pixels and white paper/black ink (00111000) being corrupted to cyan/black (00101000).
One swift 4116 replacement later, and the machine was fault free and fully functional.
It bothers me that the diags board didn't pick up this failure however, especially as the firmware does go to some lengths to ensure that memory is well exercised.
The test I expected to catch this issue is the random fill test, where a random number generator is loaded with a known seed and then used to fill the area of RAM being tested with 'random' numbers. The seed is then reinitialised, and the RAM being tested is compared against the output of the random number generator. In a healthy machine, these should always match.
It appears that as the seed is always the same, the sequence of random numbers generated always ended up masking the failure in bit 4 by generating values that were 0 in that location in the faulty areas.
Maybe a soak test where the random number generator is seeded with a series of values might be of value - I've retained the faulty IC anyhow, so I can test any changes and whether updated code will catch the fault.
One last oddity - the case in which this board resides has two stamped serial numbers:

Anyone seen this before?
B
I got a Issue 6 48K machine in a couple of weeks ago, which was sold as faulty due to coloured blocks appearing on the screen when running:

I suspected faulty upper RAM trashing the stack, and this was confirmed by running my diags board which pinpointed a dodgy IC19.
This was replaced with a fresh 4164, however the coloured blocks remained, and worse still the diags were reporting both lower and upper RAM were healthy.
At this point I started wondering whether the ULA was flaky, or if there was an issue with the address or data buses, until I realised that the coloured blocks were actually always cyan when the paper was white.
Now suspecting a lower RAM issue, I entered and ran a simple BASIC program to fill the screen with some text, and indeed this confirmed that if there was corruption, it always affected bit 4 of the byte, corresponding to IC10:

In this case certain pieces of data handled by IC10 were returning 0 regardless of what was written, resulting in random blank pixels and white paper/black ink (00111000) being corrupted to cyan/black (00101000).
One swift 4116 replacement later, and the machine was fault free and fully functional.
It bothers me that the diags board didn't pick up this failure however, especially as the firmware does go to some lengths to ensure that memory is well exercised.
The test I expected to catch this issue is the random fill test, where a random number generator is loaded with a known seed and then used to fill the area of RAM being tested with 'random' numbers. The seed is then reinitialised, and the RAM being tested is compared against the output of the random number generator. In a healthy machine, these should always match.
It appears that as the seed is always the same, the sequence of random numbers generated always ended up masking the failure in bit 4 by generating values that were 0 in that location in the faulty areas.
Maybe a soak test where the random number generator is seeded with a series of values might be of value - I've retained the faulty IC anyhow, so I can test any changes and whether updated code will catch the fault.
One last oddity - the case in which this board resides has two stamped serial numbers:

Anyone seen this before?
B
Post edited by balford on
The Spectrum Resuscitation Thread - bringing dead Spectrums back to life
zx-diagnostics - Fixing ZX Spectrums in the 21st Century (wiki)
Sinclair FAQ Wiki
zx-diagnostics - Fixing ZX Spectrums in the 21st Century (wiki)
Sinclair FAQ Wiki
Comments
Also the so called march algorithm can be handled different or used in different detail levels. This has been optimized by the industrie where testing time is a matter which shouldn't be relevant for Speccy users. :D
Here is a good summary of possible failures and how to detect and how long the test will go for a consequent test depending on number of memory cells provided.
www.ece.uc.edu/~wjone/Memory.pdf
B
zx-diagnostics - Fixing ZX Spectrums in the 21st Century (wiki)
Sinclair FAQ Wiki
You may try this out but have to type it (not too much).
I think this is quite detailed test detecting many errors using 4 different patterns and a walking bit test. Not too much, just 2,5 pages of assembly instructions. You may sell this after. :-D
http://www.ballyalley.com/ml/ml_source/RAM%20Test%20[From%20Z80%20Assembly%20Language%20Subroutines].pdf
The board and firmware were originally developed by Winston on this forum (who incidentally went on to develop the Spectranet:)) Details here, and especially here for details of the memory testing.
All I've done is had a few of the PCB's manufactured, and made the firmware a bit more user friendly and 128k aware - the updates are currently with Winston who needs to update his site before making it publicly aware.
Guesser is also working on laying out a reduced footprint version with SOIC/PLCC32 parts as the DIP flash chips are getting hard to find - thread on that here.
B
zx-diagnostics - Fixing ZX Spectrums in the 21st Century (wiki)
Sinclair FAQ Wiki
Well, my intention was to motivate you for working in this project
I would like to have that code but to lazy to type. :D
I just took a look at the code you pointed me to.
http://www.comportec.de/diag/www.alioth.net/Projects/Spectrum-Diag/software.html
If that is the actual code, there is a problem testing the patterns.
All patterns are written and read in only ONE direction (upwards).
This couldn't find cells influenced by cells with higher address.
Assume you write $88 in address $4400 which overwrites the cell in $4800. You won't find this error at all with this test when you fill upwards only because you overwrite the data later yourself with new data and when testing you read all data correctly. If $4400 overwrite $4800 it does not mean that write in $4800 will overwrite $4400. Can be - can be not. Influencing ist not always in both directions. Anyway you won't find this error with filling the memory up with the same data.
So the one problem is that this memory test go upwards only. The other problem is to fill all cells with a pattern, then read this pattern only. The real memory tests do check AND change the cell the same time in one run. So the simple memory test of the ZX81 with filling cells with $02 and decreasing two time is quite better than this wrong implemented pattern test.
This could be detected maybe with the random pattern but only if this is really random data AND if this is done in BOTH directions. The random data is written only in one direction (downwards) while it's read or compared upwards but this is a read operation only. I didn't check if the created data is really random and I am in doubt about it but to hard to for me check this in detail now.
Very weird programming of ASM code I have never seen before. There are macros defined which are included very often which results in many duplicate code. Maybe Winston tried to get the 8k rom nearly full this way. :-D
So maybe better follow my advise in retyping the code of the book I posted or better the march algorithm. Lets have a look here in detail:
This means in fact:
First fill all memory cells with 0 (or 0x00) - step 1.
Than read every cell back and write every cell with 1 (or 0xFF) immediately - step 2 in one specified order.
Then do the same in the opposite order while writing opposite data - step 3.
Then finally read the opposite data again - step 4.
That way you can surely detect influenced cells.
The other test is random in all aspects.
The intent of the macros is to allow simple and readable code - as the firmware must assume that all RAM is suspect, the testing routines do not perform call/return operations (or anything that uses the stack for flow control), and also must keep all data in registers. Usage of the available ROM space isn't a problem in this application :)
B
zx-diagnostics - Fixing ZX Spectrums in the 21st Century (wiki)
Sinclair FAQ Wiki
Okay - that's a point. But this can be handled through registers as well if you assume all memory is corrupt. IY is not used in the code. So setting IY with the desired return address and JP (IY) at end of subroutine would do the job, too.
But as we know from Microsoft - why write programs with a few hundred bytes if you can do it in a few kbytes. :o
And I think there would be space for more tools in a 8k rom than just a memory test. But the more important point is about the algorithm.
Doesn't the board display anything at all at the Spectrum ?
Even if memory (video memory) is okay ?
When you say "working on it"... I've not done anything on it since and you never mailed me what you wanted me to do ;)
Let me gather my thoughts and I'll update the original thread or drop a PM later.
B
zx-diagnostics - Fixing ZX Spectrums in the 21st Century (wiki)
Sinclair FAQ Wiki
I've left the existing walk, inversion and random tests in place, although I now run a second random test that writes memory from the top of address space down, as suggested by PokeMon.
Here's the new code - note that this uses the same macro-ised structure inherited from the base code, and uses the high byte of IX to store a map of which bits are found faulty:
; Algorithm March X ; Step1: write 0 with up addressing order; ; Step2: read 0 and write 1 with up addressing order; ; Step3: read 1 and write 0 with down addressing order; ; Step4: read 0 with down addressing order. ; ; Credit - Karl (PokeMon) on WoS MACRO MARCHTEST start, len ; Step 1 - write 0 with up addressing order ; No errors expected with this part :) ld hl, start ld bc, len .marchtest1.loop ld (hl), 0 inc hl dec bc ld a, b or c jr nz, .marchtest1.loop ; Step 2 - read 0 and write 1 with up addressing order ld hl, start ld bc, len .marchtest2.loop ld a, (hl) cp 0 jr z, .marchtest2.next MARCHBORKED jp .marchtest.done .marchtest2.next ld a, 0xff ld (hl), a inc hl dec bc ld a, b or c jr nz, .marchtest2.loop .marchtest3.start ; Step 3 - read 1 and write 0 with down addressing order ld hl, start ld bc, len - 1 add hl, bc .marchtest3.loop ld a, (hl) cp 0xff jr z, .marchtest3.next xor a MARCHBORKED jp .marchtest.done .marchtest3.next xor a ld (hl), a dec hl dec bc ld a, b or c jr nz, .marchtest3.loop .marchtest4.start ; Step 4 - read 0 with down addressing order ld hl, start ld bc, len - 1 add hl, bc .marchtest4.loop ld a, (hl) cp 0 jr z, .marchtest4.next MARCHBORKED jp .marchtest.done .marchtest4.next dec hl dec bc ld a, b or c jr nz, .marchtest4.loop .marchtest.done ENDM MACRO MARCHBORKED ld b, a ld a, ixh or b ld ixh, a ld a, BORDERRED out (ULA_PORT), a ENDMB
zx-diagnostics - Fixing ZX Spectrums in the 21st Century (wiki)
Sinclair FAQ Wiki
Nice work.
I thought about the other tests implemented. The walking bit does not make much sense when using single bit memory devices - except you expect an error in the external addressing logic. But there maybe Spectrum versions using 8 bit wide RAM where it may be helpful.
The random test (writing random data to memory) isn't that bad and I checked up the deterministic random generator. But if a random test is done and checked it should be written in both directions (up and down) and there should be provided a random start value (or maybe increasing start value with every new test).
Could be that the data written to the memory cell doesn't show any influence so a second or third test would throw any errors. If always same fixed data is written it could be detected or not by happenstance. That has the price that an error is detected not with every run but that makes sense to do a second or third test. On the other hand if the bits are tested thoroughly it isn't really necessary but just another implementation.