llvm-z80 status

edited September 2014 in Development
Few months ago I read about efforts to produce llvm back-end for the Z80 (even perhaps as part of new z88dk). Any update on that?

I found this but I have no clue how stable it is and no way of contacting the author.
Post edited by tstih on

Comments

  • edited September 2014
    tstih wrote: »
    Few months ago I read about efforts to produce llvm back-end for the Z80 (even perhaps as part of new z88dk). Any update on that?

    I found this but I have no clue how stable it is and no way of contacting the author.

    I don't think it's ever emitted z80 code. The author is clearly a Russian zx spectrum user so you may be able to find him at zx.pk.ru
  • edited September 2014
    I found the author at zx.pk.ru.

    Some generated code:
     int a;
     int b;
     int c;
    
     int main (int argc, char * argv []) {
             c = a + b;
             return (0);
     } 
    
    ==>
    
      .file "test.c"
             .text
             .globl main
             .type main, @ function
     main:
             push ix
             push de
             push bc
             ld ix, 0
             add ix, sp
             ld sp, ix
             ld hl, b
             inc hl
             ld b, (hl)
             ld a, (b)
             ld c, a
             ld hl, a
             inc hl
             ld d, (hl)
             ld a, (a)
             ld e, a
             ld h, b
             ld l, c
             add hl, de
             ld b, h
             ld c, l
             ld hl, c
             inc hl
             ld (hl), b
             ld a, c
             ld (c), a
             ld hl, 0
             pop bc
             pop de
             pop ix
             ret
     .tmp0:
             .size main, .tmp0-main
    
             .type a, @ object
             .comm a, 2,1
             .type b, @ object
             .comm b, 2,1
             .type c, @ object
             .comm c, 2,1 
    

    Several non-existent instructions there, confusion over the names of registers and variables, incorrect generation, and no optimization so just a start. The author says at the end that the project is on hold.

    For comparison z88dk output:
    ;	SECTION	code
    
    
    ._main
    	ld	de,(_a)
    	ld	hl,(_b)
    	add	hl,de
    	ld	(_c),hl
    	ld	hl,0	;const
    	ret
    
    
    
    
    ; --- Start of Static Variables ---
    
    ;	SECTION	bss
    
    ._a	defs	2
    ._b	defs	2
    ._c	defs	2
    

    There's also discussion in that thread about assigning frame pointer to ix and using iy for things. There's a problem with all the modern compiler tools out there and that is they assume architectures with orthogonal instruction sets and many addressing modes, in particular a fast stack relative mode. Their code generator models will always create poor code for more limited 8-bit architectures. Code generation strategies have to be different than current textbook practice if you want results similar to hand-coded asm on limited targets and no one is currently doing that.

    The most glaring example is deciding a priori that ix will be a frame pointer. I've written tens of thousands of lines of z80 asm over the years and only a few times have I used ix as a frame pointer in my own code. The reason is its use is slow and memory-wasteful. Once your compiler decides it's using ix, it will never be capable of generating roughly equivalent code to hand-generated asm. The code generators must be capable of doing what humans do -- if a register spill must occur (and there should be an appropriate penalty for that so there is incentive to keep tight loops in registers) the spilled registers should be pushed and popped from the top of the stack.

    The code generators should also be doing register allocation for the most nested basic blocks first and then working with those allocations in enclosing blocks. That is how humans write optimal code -- optimize the most nested part first as that is where the execution time is spent. Right now the decision to have a frame pointer in every subroutine means each subroutine has register allocation done in isolation and communication between subroutines is via pushing params on stack rather than via register. That is a big deal in terms of performance.

    Anyway if someone is thinking of trying a new code generator, please keep that in mind. Solving these issues will create a step up in the quality of compiler output and without them, asm-like performance is only approachable via asm-coded library routines.
  • edited September 2014
    Anyway if someone is thinking of trying a new code generator, please keep that in mind. Solving these issues will create a step up in the quality of compiler output and without them, asm-like performance is only approachable via asm-coded library routines.

    I gave up hope of implementing my own C compiler. :) But I would still like to succeed in my effort to implement C source level debugging on ZX Spectrum. I am now at this point - gdb-z80 stub on ZX Spectrum.

    From here on my options are limited. To implement C source level debugging one needs to generate stabs or stabs+ compatible debugging info and insert it into generated assembly code. I also need to be able to use binutils-z80 which produces gdb-z80 compatible debuggables.

    I found this SDCC upgrade but it won't compile. And it does not generate stabs/stabs+. As my program heavily depends on pointers to functions and z88dk does not support these completely; I also can't switch to this compiler which -if I read correctly- supports binutils assembler dialect.

    Even if stabs were implemented one would still need to update gdb-z80 code to handle function frames (I am not sure what that is yet; it has to do with how arguments are passed into functions) correctly. That is why I was looking into llvm-z80. I assumed that clang supports all of these by default.

    p.s, This guy Alex Tsidaev - even implemented a plugin for ZX Spectrum emulator, that supports GDB protocol so that no stub is needed. That is doable for FUSE. :-D
  • edited September 2014
    tstih wrote: »
    From here on my options are limited. To implement C source level debugging one needs to generate stabs or stabs+ compatible debugging info and insert it into generated assembly code. I also need to be able to use binutils-z80 which produces gdb-z80 compatible debuggables.

    So z80-gdb requires coff format object files with stabs information inside? In that case I think you do have to use binutils as the last step assembler.
    I found this SDCC upgrade but it won't compile. And it does not generate stabs/stabs+.

    It looks like it's an independent debugger but it's still missing a lot (like symbols for instance). It looks like it's only single stepping and break pointing at asm code level.
    As my program heavily depends on pointers to functions and z88dk does not support these completely; I also can't switch to this compiler which -if I read correctly- supports binutils assembler dialect.

    The functions pointers are not typed. When I use them I just use void* as type.

    It does look like sccz80 will output binutils-compatible assembler. But the libraries are written in z80asm assembler (ie all the directives are z80asm directives). I wonder if it would be easy to write a translator to automatically translate the z80asm code into binutils code.

    If you could arrange to use binutils as the backend assembler, you still have to insert the stabs directives somehow. In the library code that would be done by hand. The sccz80 compiler would have to be changed to emit stabs information when in binutils mode.

    sdcc has already been modified to generate z80asm assembler so adding binutils as another output option shouldn't be hard to do.

    z80asm was recently modified to support sections and with that, sdcc should be easy to get working with the z88dk libraries. So the prospect is there to have sdcc generating binutils asm with z88dk libs in binutils format (if the translator idea is simple). But again you will need to modify the code generator to insert stabs directives when in binutils mode. sccz80 is fairly simple to change for that purpose but sdcc is more complicated, at least to my unaccustomed eyes.
    p.s, This guy Alex Tsidaev - even implemented a plugin for ZX Spectrum emulator, that supports GDB protocol so that no stub is needed. That is doable for FUSE. :-D

    That must also be a symbol-less debug?
  • edited September 2014
    So z80-gdb requires coff format object files with stabs information inside? In that case I think you do have to use binutils as the last step assembler.

    Yes. That is what the sdcc-z80-gas link that I posted is. An upgrade to sdcc to produce binutils-z80 specific assembly.
    The functions pointers are not typed. When I use them I just use void* as type.

    Is it an error if I use prototypes i.e. if I write (void *)(*malloc)() into header file or are these ignored and just no type checking is performed?
    It does look like sccz80 will output binutils-compatible assembler. But the libraries are written in z80asm assembler (ie all the directives are z80asm directives). I wonder if it would be easy to write a translator to automatically translate the z80asm code into binutils code.
    If you could arrange to use binutils as the backend assembler, you still have to insert the stabs directives somehow.

    Yes. The trouble is -if I correctly understand how this works- the stabs. Stabs are textual additions (embedded as comments) to assembly code. These enriched ASM files are then fed into gas (binutils-z80 assember) to produce correct output format (coff) for the Gnu Debugger. They need to be insterted by the C compiler itself - because stabs coments contain information about function names, arguments, stack, static variables, types, etc. of C code. That is the way to enable C source level debugging via GDB.
    z80asm was recently modified to support sections and with that, sdcc should be easy to get working with the z88dk libraries

    The last time I looked the binary compatibility was a problem due to the order in which parameters were passed to a function on a stack and some optimizations such as HL passed arguments in z88dk. So I am not sure that would work out of the box.
    That must also be a symbol-less debug?

    One does not need symbols on emulator side. Just a protocol with several simple functions such as read memory block, write memory block, set pc to, query value of registers. All symbols are resolved on the gdb-z80 side when it reads the debug info from compiled binary. It strips this info on PC side and forwards pure binary to the actual hardware.

    At present I run a special ROM to use FUSE to debug using gnu-debugger. But if the protocol was natively supported by the emulator one could debug on ZX Spectrum using standard Sinclair ROM. That is what these guys did. They produced emulated "hardware debugger" that speaks the "gdb stub" protocol and hooks into emulator to set hardware breakpoints, watchpoints, etc.
  • edited September 2014
    tstih wrote: »
    Is it an error if I use prototypes i.e. if I write (void *)(*malloc)() into header file or are these ignored and just no type checking is performed?

    That prototype with empty parameter list will work I think but if you try to list parameters it will not be accepted. sccz80 does not carry any type information about function pointers at all and just lets you do whatever you want.

    I tend to declare function pointers as generic void* because I can never remember what can and can't be declared:

    void *p_malloc;
    void *p_free;
    ...
    p_malloc = malloc;
    p_free = free;
    ...

    int *addr = (int *)((p_malloc)(10*sizeof(int)));

    For treating the void* as a function pointer I prefer the "(fptr)(param list)" form. I think "(*fptr)(param list)" will also work -- obviously if the pointer is void* the compiler has no idea what it is and you have to write invocation in a way where the compiler knows it can only be a function pointer.

    Proper typing of function pointers in sccz80 is a little harder than in sdcc as there is more than one calling convention.
    Yes. The trouble is -if I correctly understand how this works- the stabs. Stabs are textual additions (embedded as comments) to assembly code. These enriched ASM files are then fed into gas (binutils-z80 assember) to produce correct output format (coff) for the Gnu Debugger. They need to be insterted by the C compiler itself - because stabs coments contain information about function names, arguments, stack, static variables, types, etc. of C code. That is the way to enable C source level debugging via GDB.

    In the link you gave for the emulator supported gdb does get sdcc to emit some symbol info. You can spot it in the source code by looking at the files marked "GNU modified".

    Have you tried contacting the author about this? I think this is something sdcc would be interested in natively supporting so maybe start with the latest source and look at making the same modifications this fellow did to get it working. I'd make sure that debugging info was only emitted if requested (like a -g) and the debugger stubs should emit different things depending on the current assembler target. For assemblers other than binutils maybe emit stubs as a comment? ("; STUBS ....") That way if anyone wants to support it in another assembler, it would be easy.
    The last time I looked the binary compatibility was a problem due to the order in which parameters were passed to a function on a stack and some optimizations such as HL passed arguments in z88dk. So I am not sure that would work out of the box.

    We've solved this already. There are actually two independent clibs in z88dk right now -- the original one and the new one.

    The original one is using several different calling conventions and the C linkage is L->R order rather than R->L order. For that one, sdcc has been modified to be able to generate L->R order but a lot of library code will still fail to link properly unmodified because the other linkages -- callee and fastcall -- are commonly used.

    The new one has separated the asm implementation of the library from the C preambles which means any C compiler can use it by writing new preamble code. We've done that already for sdcc so the entire lib is sdcc friendly. I have been compiling sdcc programs using it, eg, but we've had trouble with getting global init info out of sdcc, getting a correct label to main and a few other things so that I can only run the results by modifying sdcc's output by hand. The introduction of sections into z80asm should automagically solve these issues. We're using z80asm as the assembler underneath sdcc as that's what the z88dk libraries are written for.

    The new lib is rooted here. I'm going through it again now (in my personal repository) to introduce sections, fix some organization issues, and introduce the driver model.

    To show you how the sdcc C interface works, you can see how the sdcc C vfprintf incorporates the z80 implementation of vfprintf. The C interface code collects parameters appropriately for the compiler and then INCLUDES the z80 implementation to avoid overhead of a jump. It's the C preamble code where you would have to manually insert stubs information, maybe as little as a stubs directive to name the library function and list parameters.

    That example C preamble comes from the sdcc IY library. The library assumes it can use one index register either IX or IY. It is written to use IX but an assemble-time switch can generate code that uses IY instead. The C preamble I showed is for the IY version of the library and that's why we can "use" IX (actually IY) without saving it for sdcc. The IX version of the library is much more ugly as it must save IX for sdcc if the lib uses IX. The IX version of sdcc vfprintf preamble.

    Inside the z80 implementations of some functions which walk the stack are IFs that test which compiler generated the code. Functions like printf need to walk the stack to collect an unknown number of parameters. So the sccz80 version walks down the stack when collecting the next param (L->R order) and the sdcc version walks up (R->L). Likewise invoking functions through pointers (see bsearch & qsort) have to have params pushed onto the stack in the right order. There are things to think about but it's been solved and it's all ready to go :)

    An automatic translator from z80asm to binutils code I think only has to worry about a 1:1 replacement of directives. Maybe there's only a handful of troublesome cases:

    * we do have some source files that are effectively empty. There is some object oriented asm code in the library and some inheritance relationships show up as aliases. So, eg, b_array_empty (is the array type empty?) and b_vector_empty (is the vector empty?) share the same code. Rather than doing this:

    b_vector_empty: jp b_array_empty

    which takes up three bytes, we do this:

    defc b_vector_empty = b_array_empty

    which takes 0 bytes. I think some linkers might have difficulty with a zero-size object file.


    * sections are implemented such that the linker can aggregate them. So we have (or will have) a fairly fine-grain division of library code into different sections so the end user will have great flexibility in assigning library code to different memory segments.

    All malloc code (malloc, free, realloc, aligned_alloc,....) goes into its own code/bss/data/data2/rodata sections. They are named seg_code_malloc / seg_bss_malloc, seg_data_malloc, etc.

    In the crt we collect those sections together into segments as desired. If there is one code segment, all the code sections are added to it:

    section CODE
    org 0
    section seg_code_malloc
    section seg_code_stdio
    ...

    Sections without an ORG are homeless and will just be merged into a previous section that does have an ORG (CODE above). The linker will output a singe CODE.bin file with all that stuff in there.

    I'm not sure how compatible this is with binutils but in worst case you can probably translate "section seg_code_mallc" into "section code" as I know binutils, at minimum, supports CODE / BSS / DATA sections.




    Anyway, I'd say get the C compiler generating symbol data first. sdcc is probably the best choice as all its library code is written in C so you have a great deal of symbol information available automatically if you can get the compiler to generate it.
Sign In or Register to comment.