Apropos of nothing, but sort of related given the routines on this page, I saw quite an interesting fixed point hardware divider module on OpenCores. Instead of just a simple divider module that does x/y, it multiplies x by the reciprocal (in other words, x * 1/y) and caches the result of 1/y.
The thought behind this is quite a lot of the time, you use the same denominator (for example in a loop), so you do the expensive division operation once, and then the remaining times (at least while the result of 1/y is still in the cache) you do much cheaper multiply operations.
Apropos of nothing, but sort of related given the routines on this page, I saw quite an interesting fixed point hardware divider module on OpenCores. Instead of just a simple divider module that does x/y, it multiplies x by the reciprocal (in other words, x * 1/y) and caches the result of 1/y.
The thought behind this is quite a lot of the time, you use the same denominator (for example in a loop), so you do the expensive division operation once, and then the remaining times (at least while the result of 1/y is still in the cache) you do much cheaper multiply operations.
That's a standard optimisation trick in many languages.
That's a standard optimisation trick in many languages.
But the amount you get from this optimisation is pretty spectacular in hardware, many FPGAs have hardware multipliers which can do a multiply in a single clock tick (and a typical multiplier for an FPGA/ASIC that doesn't, many soft core multipliers will do it in 3 cycles without using an absurd amount of logic blocks). Divides on the other hand can take 30-odd clocks on any hardware divider that doesn't use an absurd amount of resources and while many FPGAs have multiplier blocks I've not yet come across one with a divider.
But the amount you get from this optimisation is pretty spectacular in hardware, many FPGAs have hardware multipliers which can do a multiply in a single clock tick (and a typical multiplier for an FPGA/ASIC that doesn't, many soft core multipliers will do it in 3 cycles without using an absurd amount of logic blocks). Divides on the other hand can take 30-odd clocks on any hardware divider that doesn't use an absurd amount of resources and while many FPGAs have multiplier blocks I've not yet come across one with a divider.
Exactly! Strangely simple optimisation concepts, such as this, are rarely taught today, leaving to a sad lack of knowledge about the speed at which things are processed.
Exactly! Strangely simple optimisation concepts, such as this, are rarely taught today, leaving to a sad lack of knowledge about the speed at which things are processed.
Not sure about that... it's a really simple idea that shows up over and over again under different names (memoization, dynamic programming, cache, etc) and is something taught in any data structures or algorithms course which every cs or engineering major takes.
This is fascinating although I am not sure I follow. Any chance of a more detailed description or link?
Remember the results of long computations which you may need again soon so that you do not need to recompute them.
I'm not sure how much advantage it is to cache multiply / divide results on a z80 machine, given the limited memory and time to access a cache, but it will depend a lot on whether many multiplies / divides occur with the same operands close together. Storing a reciprocal for divisions with the same divisor might do ok if division takes a lot longer than multiplication.
With loop unrolling, leading zero elimination and reduction in operand sizes, you can speed up z80 routines two times or more (for example, against the baze link above) and the difference in time between multiply and divide reduces, which may make the additional overhead of cache lookup on every operation too large to be useful.
You won't know unless you try with a specific application in mind though.
Not sure about that... it's a really simple idea that shows up over and over again under different names (memoization, dynamic programming, cache, etc) and is something taught in any data structures or algorithms course which every cs or engineering major
Maybe where you are, but with many graduates I've worked with here in the UK in the past most - but not all - are not familiar with such concepts, or even aware of the efficiency of their code. And this was in a large games company where such things can make a big difference - be that in the games themselves, or the tools to support them.
Maybe where you are, but with many graduates I've worked with here in the UK in the past most - but not all - are not familiar with such concepts, or even aware of the efficiency of their code.
It could just be down to the quality of grads being seen and hired.
The study of algorithms and complexity is really what CS is all about so I think it's impossible to omit from any CS curriculum. On the other hand, it is entirely possible for students to graduate with a degree without having understood most of what has been taught.
Engineering education is more diffuse. I haven't seen a school that didn't devote one course to data structures / algorithms but engineering schools quite often squeeze topics as needed into other courses, so it's not impossible that some schools may hide such a topic in another course. Honestly I think engineering schools should probably go to five years rather than try to fit into four years. The pedagogy has suffered terribly over the past 40 years because of trying to reduce and compress necessary topics into four years. I'm not speaking from the point of view of an academic but from the point of view of a student who had to go into the literature, sometimes all the way back to 1908 to find substantial pedagogical discussion on core EE topics.
A CS grad who doesn't know what a cache is is gaping but so is an EE grad who doesn't know how a circuit works. I think the former is due to a poor student and the latter is due to weak pedagogy.
It's known to the rest of the world as SLL, no idea where he got SLIA from
...
Well SLIA is certainly a more accurate description of what that operation does on a Z80. I'm pretty sure I also recall it being referred to as SLI (Shift Left Illogical) in the distant past - also more accurate than SLL :)
This Z80 instruction was supposed to be named SLL (Shift Left Logical) officially. However, since it didn't work as intended, this original Your Spectrum article decided to name it SLS (Shift Left Set) instead.
Both names are quite popular and can be found on numerous references. There was no need to invent a new mnemonic for an old instruction!
There's a surprising number of Google hits for "Shift Left Inverted Arithmetic" (well, it surprised me, anyway). I wonder where it stems from? Maybe the CP/M or TI-calc scenes or something.
Comments
The thought behind this is quite a lot of the time, you use the same denominator (for example in a loop), so you do the expensive division operation once, and then the remaining times (at least while the result of 1/y is still in the cache) you do much cheaper multiply operations.
That's a standard optimisation trick in many languages.
https://twitter.com/bobsstuffgames
https://www.facebook.com/bobs.stuff
http://www.bobs-stuff.co.uk
But the amount you get from this optimisation is pretty spectacular in hardware, many FPGAs have hardware multipliers which can do a multiply in a single clock tick (and a typical multiplier for an FPGA/ASIC that doesn't, many soft core multipliers will do it in 3 cycles without using an absurd amount of logic blocks). Divides on the other hand can take 30-odd clocks on any hardware divider that doesn't use an absurd amount of resources and while many FPGAs have multiplier blocks I've not yet come across one with a divider.
Exactly! Strangely simple optimisation concepts, such as this, are rarely taught today, leaving to a sad lack of knowledge about the speed at which things are processed.
https://twitter.com/bobsstuffgames
https://www.facebook.com/bobs.stuff
http://www.bobs-stuff.co.uk
Many thanks
Paddy
Not sure about that... it's a really simple idea that shows up over and over again under different names (memoization, dynamic programming, cache, etc) and is something taught in any data structures or algorithms course which every cs or engineering major takes.
Remember the results of long computations which you may need again soon so that you do not need to recompute them.
I'm not sure how much advantage it is to cache multiply / divide results on a z80 machine, given the limited memory and time to access a cache, but it will depend a lot on whether many multiplies / divides occur with the same operands close together. Storing a reciprocal for divisions with the same divisor might do ok if division takes a lot longer than multiplication.
With loop unrolling, leading zero elimination and reduction in operand sizes, you can speed up z80 routines two times or more (for example, against the baze link above) and the difference in time between multiply and divide reduces, which may make the additional overhead of cache lookup on every operation too large to be useful.
You won't know unless you try with a specific application in mind though.
Write games in C using Z88DK and SP1
Maybe where you are, but with many graduates I've worked with here in the UK in the past most - but not all - are not familiar with such concepts, or even aware of the efficiency of their code. And this was in a large games company where such things can make a big difference - be that in the games themselves, or the tools to support them.
https://twitter.com/bobsstuffgames
https://www.facebook.com/bobs.stuff
http://www.bobs-stuff.co.uk
It could just be down to the quality of grads being seen and hired.
The study of algorithms and complexity is really what CS is all about so I think it's impossible to omit from any CS curriculum. On the other hand, it is entirely possible for students to graduate with a degree without having understood most of what has been taught.
Engineering education is more diffuse. I haven't seen a school that didn't devote one course to data structures / algorithms but engineering schools quite often squeeze topics as needed into other courses, so it's not impossible that some schools may hide such a topic in another course. Honestly I think engineering schools should probably go to five years rather than try to fit into four years. The pedagogy has suffered terribly over the past 40 years because of trying to reduce and compress necessary topics into four years. I'm not speaking from the point of view of an academic but from the point of view of a student who had to go into the literature, sometimes all the way back to 1908 to find substantial pedagogical discussion on core EE topics.
A CS grad who doesn't know what a cache is is gaping but so is an EE grad who doesn't know how a circuit works. I think the former is due to a poor student and the latter is due to weak pedagogy.
Write games in C using Z88DK and SP1
That's the first time I've seen the instruction SLIA.
...
Both names are quite popular and can be found on numerous references. There was no need to invent a new mnemonic for an old instruction!
Compilers that support it usually accept SLL though.