fixed point maths

edited July 2014 in Development
I came across this site which is quite interesting
http://z80-heaven.wikidot.com/advanced-math#toc18
Post edited by seedy1812 on

Comments

  • edited July 2014
    Wow! This is useful, thanks.
    My blog (in Portuguese): Cantinho do TK90X.
  • edited July 2014
    Apropos of nothing, but sort of related given the routines on this page, I saw quite an interesting fixed point hardware divider module on OpenCores. Instead of just a simple divider module that does x/y, it multiplies x by the reciprocal (in other words, x * 1/y) and caches the result of 1/y.

    The thought behind this is quite a lot of the time, you use the same denominator (for example in a loop), so you do the expensive division operation once, and then the remaining times (at least while the result of 1/y is still in the cache) you do much cheaper multiply operations.
  • edited July 2014
    Winston wrote: »
    Apropos of nothing, but sort of related given the routines on this page, I saw quite an interesting fixed point hardware divider module on OpenCores. Instead of just a simple divider module that does x/y, it multiplies x by the reciprocal (in other words, x * 1/y) and caches the result of 1/y.

    The thought behind this is quite a lot of the time, you use the same denominator (for example in a loop), so you do the expensive division operation once, and then the remaining times (at least while the result of 1/y is still in the cache) you do much cheaper multiply operations.

    That's a standard optimisation trick in many languages.
  • edited July 2014
    bobs wrote: »
    That's a standard optimisation trick in many languages.

    But the amount you get from this optimisation is pretty spectacular in hardware, many FPGAs have hardware multipliers which can do a multiply in a single clock tick (and a typical multiplier for an FPGA/ASIC that doesn't, many soft core multipliers will do it in 3 cycles without using an absurd amount of logic blocks). Divides on the other hand can take 30-odd clocks on any hardware divider that doesn't use an absurd amount of resources and while many FPGAs have multiplier blocks I've not yet come across one with a divider.
  • edited July 2014
    Winston wrote: »
    But the amount you get from this optimisation is pretty spectacular in hardware, many FPGAs have hardware multipliers which can do a multiply in a single clock tick (and a typical multiplier for an FPGA/ASIC that doesn't, many soft core multipliers will do it in 3 cycles without using an absurd amount of logic blocks). Divides on the other hand can take 30-odd clocks on any hardware divider that doesn't use an absurd amount of resources and while many FPGAs have multiplier blocks I've not yet come across one with a divider.

    Exactly! Strangely simple optimisation concepts, such as this, are rarely taught today, leaving to a sad lack of knowledge about the speed at which things are processed.
  • edited July 2014
    This is fascinating although I am not sure I follow. Any chance of a more detailed description or link?

    Many thanks

    Paddy
  • edited July 2014
    bobs wrote: »
    Exactly! Strangely simple optimisation concepts, such as this, are rarely taught today, leaving to a sad lack of knowledge about the speed at which things are processed.

    Not sure about that... it's a really simple idea that shows up over and over again under different names (memoization, dynamic programming, cache, etc) and is something taught in any data structures or algorithms course which every cs or engineering major takes.
    This is fascinating although I am not sure I follow. Any chance of a more detailed description or link?

    Remember the results of long computations which you may need again soon so that you do not need to recompute them.

    I'm not sure how much advantage it is to cache multiply / divide results on a z80 machine, given the limited memory and time to access a cache, but it will depend a lot on whether many multiplies / divides occur with the same operands close together. Storing a reciprocal for divisions with the same divisor might do ok if division takes a lot longer than multiplication.

    With loop unrolling, leading zero elimination and reduction in operand sizes, you can speed up z80 routines two times or more (for example, against the baze link above) and the difference in time between multiply and divide reduces, which may make the additional overhead of cache lookup on every operation too large to be useful.

    You won't know unless you try with a specific application in mind though.
  • edited July 2014
    Not sure about that... it's a really simple idea that shows up over and over again under different names (memoization, dynamic programming, cache, etc) and is something taught in any data structures or algorithms course which every cs or engineering major

    Maybe where you are, but with many graduates I've worked with here in the UK in the past most - but not all - are not familiar with such concepts, or even aware of the efficiency of their code. And this was in a large games company where such things can make a big difference - be that in the games themselves, or the tools to support them.
  • edited July 2014
    bobs wrote: »
    Maybe where you are, but with many graduates I've worked with here in the UK in the past most - but not all - are not familiar with such concepts, or even aware of the efficiency of their code.

    It could just be down to the quality of grads being seen and hired.

    The study of algorithms and complexity is really what CS is all about so I think it's impossible to omit from any CS curriculum. On the other hand, it is entirely possible for students to graduate with a degree without having understood most of what has been taught.

    Engineering education is more diffuse. I haven't seen a school that didn't devote one course to data structures / algorithms but engineering schools quite often squeeze topics as needed into other courses, so it's not impossible that some schools may hide such a topic in another course. Honestly I think engineering schools should probably go to five years rather than try to fit into four years. The pedagogy has suffered terribly over the past 40 years because of trying to reduce and compress necessary topics into four years. I'm not speaking from the point of view of an academic but from the point of view of a student who had to go into the literature, sometimes all the way back to 1908 to find substantial pedagogical discussion on core EE topics.

    A CS grad who doesn't know what a cache is is gaping but so is an EE grad who doesn't know how a circuit works. I think the former is due to a poor student and the latter is due to weak pedagogy.
  • edited July 2014
    g0blinish wrote: »

    That's the first time I've seen the instruction SLIA.
  • edited July 2014
    Morkin wrote: »
    That's the first time I've seen the instruction SLIA.
    It's known to the rest of the world as SLL, no idea where he got SLIA from
    ...
    I wanna tell you a story 'bout a woman I know...
  • edited July 2014
    karingal wrote: »
    It's known to the rest of the world as SLL, no idea where he got SLIA from
    ...
    Well SLIA is certainly a more accurate description of what that operation does on a Z80. I'm pretty sure I also recall it being referred to as SLI (Shift Left Illogical) in the distant past - also more accurate than SLL :)
  • edited July 2014
    This Z80 instruction was supposed to be named SLL (Shift Left Logical) officially. However, since it didn't work as intended, this original Your Spectrum article decided to name it SLS (Shift Left Set) instead.

    Both names are quite popular and can be found on numerous references. There was no need to invent a new mnemonic for an old instruction!
    Creator of ZXDB, BIFROST/NIRVANA, ZX7/RCS, etc. I don't frequent this forum anymore, please look for me elsewhere.
  • edited July 2014
    There's a surprising number of Google hits for "Shift Left Inverted Arithmetic" (well, it surprised me, anyway). I wonder where it stems from? Maybe the CP/M or TI-calc scenes or something.
  • edited July 2014
    www.z80.info has the following:
    Shift Left with One Insertion (SL1)

    This instruction operates like SLA, except that a 1 is inserted into the low bit of the operand.

    This instruction is inappropriately called SLL in other documents. (Probably for symmetry in the naming scheme.)
    Instruction    Opcode            r            
    SL1 r          CB 30+r           0 B    4 H
    SL1 (IX+nn)    DD CB nn 36       1 C    5 L
    SL1 (IY+nn)    FD CB nn 36       2 D    6 (HL)
                                     3 E    7 A
    

    Compilers that support it usually accept SLL though.
  • edited July 2014
    Yep, symmetry is absolutely correct.
    I wanna tell you a story 'bout a woman I know...
Sign In or Register to comment.