6ad3bab7fe
Make the source code for the documentation a little easier to deal with by breaking it into individual chapter files. Add support to rdsrc.pl for auto-generating dependencies. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
504 lines
20 KiB
Text
504 lines
20 KiB
Text
\C{32bit} Writing 32-bit Code (Unix, Win32, DJGPP)
|
|
|
|
This chapter attempts to cover some of the common issues involved
|
|
when writing 32-bit code, to run under \i{Win32} or Unix, or to be
|
|
linked with C code generated by a Unix-style C compiler such as
|
|
\i{DJGPP}. It covers how to write assembly code to interface with
|
|
32-bit C routines, and how to write position-independent code for
|
|
shared libraries.
|
|
|
|
Almost all 32-bit code, and in particular all code running under
|
|
\c{Win32}, \c{DJGPP} or any of the PC Unix variants, runs in \I{flat
|
|
memory model}\e{flat} memory model. This means that the segment registers
|
|
and paging have already been set up to give you the same 32-bit 4Gb
|
|
address space no matter what segment you work relative to, and that
|
|
you should ignore all segment registers completely. When writing
|
|
flat-model application code, you never need to use a segment
|
|
override or modify any segment register, and the code-section
|
|
addresses you pass to \c{CALL} and \c{JMP} live in the same address
|
|
space as the data-section addresses you access your variables by and
|
|
the stack-section addresses you access local variables and procedure
|
|
parameters by. Every address is 32 bits long and contains only an
|
|
offset part.
|
|
|
|
|
|
\H{32c} Interfacing to 32-bit C Programs
|
|
|
|
A lot of the discussion in \k{16c}, about interfacing to 16-bit C
|
|
programs, still applies when working in 32 bits. The absence of
|
|
memory models or segmentation worries simplifies things a lot.
|
|
|
|
|
|
\S{32cunder} External Symbol Names
|
|
|
|
Most 32-bit C compilers share the convention used by 16-bit
|
|
compilers, that the names of all global symbols (functions or data)
|
|
they define are formed by prefixing an underscore to the name as it
|
|
appears in the C program. However, not all of them do: the \c{ELF}
|
|
specification states that C symbols do \e{not} have a leading
|
|
underscore on their assembly-language names.
|
|
|
|
The older Linux \c{a.out} C compiler, all \c{Win32} compilers,
|
|
\c{DJGPP}, and \c{NetBSD} and \c{FreeBSD}, all use the leading
|
|
underscore; for these compilers, the macros \c{cextern} and
|
|
\c{cglobal}, as given in \k{16cunder}, will still work. For \c{ELF},
|
|
though, the leading underscore should not be used.
|
|
|
|
See also \k{opt-pfix}.
|
|
|
|
\S{32cfunc} Function Definitions and Function Calls
|
|
|
|
\I{functions, C calling convention}The \i{C calling convention}
|
|
in 32-bit programs is as follows. In the following description,
|
|
the words \e{caller} and \e{callee} are used to denote
|
|
the function doing the calling and the function which gets called.
|
|
|
|
\b The caller pushes the function's parameters on the stack, one
|
|
after another, in reverse order (right to left, so that the first
|
|
argument specified to the function is pushed last).
|
|
|
|
\b The caller then executes a near \c{CALL} instruction to pass
|
|
control to the callee.
|
|
|
|
\b The callee receives control, and typically (although this is not
|
|
actually necessary, in functions which do not need to access their
|
|
parameters) starts by saving the value of \c{ESP} in \c{EBP} so as
|
|
to be able to use \c{EBP} as a base pointer to find its parameters
|
|
on the stack. However, the caller was probably doing this too, so
|
|
part of the calling convention states that \c{EBP} must be preserved
|
|
by any C function. Hence the callee, if it is going to set up
|
|
\c{EBP} as a \i{frame pointer}, must push the previous value first.
|
|
|
|
\b The callee may then access its parameters relative to \c{EBP}.
|
|
The doubleword at \c{[EBP]} holds the previous value of \c{EBP} as
|
|
it was pushed; the next doubleword, at \c{[EBP+4]}, holds the return
|
|
address, pushed implicitly by \c{CALL}. The parameters start after
|
|
that, at \c{[EBP+8]}. The leftmost parameter of the function, since
|
|
it was pushed last, is accessible at this offset from \c{EBP}; the
|
|
others follow, at successively greater offsets. Thus, in a function
|
|
such as \c{printf} which takes a variable number of parameters, the
|
|
pushing of the parameters in reverse order means that the function
|
|
knows where to find its first parameter, which tells it the number
|
|
and type of the remaining ones.
|
|
|
|
\b The callee may also wish to decrease \c{ESP} further, so as to
|
|
allocate space on the stack for local variables, which will then be
|
|
accessible at negative offsets from \c{EBP}.
|
|
|
|
\b The callee, if it wishes to return a value to the caller, should
|
|
leave the value in \c{AL}, \c{AX} or \c{EAX} depending on the size
|
|
of the value. Floating-point results are typically returned in
|
|
\c{ST0}.
|
|
|
|
\b Once the callee has finished processing, it restores \c{ESP} from
|
|
\c{EBP} if it had allocated local stack space, then pops the previous
|
|
value of \c{EBP}, and returns via \c{RET} (equivalently, \c{RETN}).
|
|
|
|
\b When the caller regains control from the callee, the function
|
|
parameters are still on the stack, so it typically adds an immediate
|
|
constant to \c{ESP} to remove them (instead of executing a number of
|
|
slow \c{POP} instructions). Thus, if a function is accidentally
|
|
called with the wrong number of parameters due to a prototype
|
|
mismatch, the stack will still be returned to a sensible state since
|
|
the caller, which \e{knows} how many parameters it pushed, does the
|
|
removing.
|
|
|
|
There is an alternative calling convention used by Win32 programs
|
|
for Windows API calls, and also for functions called \e{by} the
|
|
Windows API such as window procedures: they follow what Microsoft
|
|
calls the \c{__stdcall} convention. This is slightly closer to the
|
|
Pascal convention, in that the callee clears the stack by passing a
|
|
parameter to the \c{RET} instruction. However, the parameters are
|
|
still pushed in right-to-left order.
|
|
|
|
Thus, you would define a function in C style in the following way:
|
|
|
|
\c global _myfunc
|
|
\c
|
|
\c _myfunc:
|
|
\c push ebp
|
|
\c mov ebp,esp
|
|
\c sub esp,0x40 ; 64 bytes of local stack space
|
|
\c mov ebx,[ebp+8] ; first parameter to function
|
|
\c
|
|
\c ; some more code
|
|
\c
|
|
\c leave ; mov esp,ebp / pop ebp
|
|
\c ret
|
|
|
|
At the other end of the process, to call a C function from your
|
|
assembly code, you would do something like this:
|
|
|
|
\c extern _printf
|
|
\c
|
|
\c ; and then, further down...
|
|
\c
|
|
\c push dword [myint] ; one of my integer variables
|
|
\c push dword mystring ; pointer into my data segment
|
|
\c call _printf
|
|
\c add esp,byte 8 ; `byte' saves space
|
|
\c
|
|
\c ; then those data items...
|
|
\c
|
|
\c segment _DATA
|
|
\c
|
|
\c myint dd 1234
|
|
\c mystring db 'This number -> %d <- should be 1234',10,0
|
|
|
|
This piece of code is the assembly equivalent of the C code
|
|
|
|
\c int myint = 1234;
|
|
\c printf("This number -> %d <- should be 1234\n", myint);
|
|
|
|
|
|
\S{32cdata} Accessing Data Items
|
|
|
|
To get at the contents of C variables, or to declare variables which
|
|
C can access, you need only declare the names as \c{GLOBAL} or
|
|
\c{EXTERN}. (Again, the names require leading underscores, as stated
|
|
in \k{32cunder}.) Thus, a C variable declared as \c{int i} can be
|
|
accessed from assembler as
|
|
|
|
\c extern _i
|
|
\c mov eax,[_i]
|
|
|
|
And to declare your own integer variable which C programs can access
|
|
as \c{extern int j}, you do this (making sure you are assembling in
|
|
the \c{_DATA} segment, if necessary):
|
|
|
|
\c global _j
|
|
\c _j dd 0
|
|
|
|
To access a C array, you need to know the size of the components of
|
|
the array. For example, \c{int} variables are four bytes long, so if
|
|
a C program declares an array as \c{int a[10]}, you can access
|
|
\c{a[3]} by coding \c{mov ax,[_a+12]}. (The byte offset 12 is obtained
|
|
by multiplying the desired array index, 3, by the size of the array
|
|
element, 4.) The sizes of the C base types in 32-bit compilers are:
|
|
1 for \c{char}, 2 for \c{short}, 4 for \c{int}, \c{long} and
|
|
\c{float}, and 8 for \c{double}. Pointers, being 32-bit addresses,
|
|
are also 4 bytes long.
|
|
|
|
To access a C \i{data structure}, you need to know the offset from
|
|
the base of the structure to the field you are interested in. You
|
|
can either do this by converting the C structure definition into a
|
|
NASM structure definition (using \c{STRUC}), or by calculating the
|
|
one offset and using just that.
|
|
|
|
To do either of these, you should read your C compiler's manual to
|
|
find out how it organizes data structures. NASM gives no special
|
|
alignment to structure members in its own \i\c{STRUC} macro, so you
|
|
have to specify alignment yourself if the C compiler generates it.
|
|
Typically, you might find that a structure like
|
|
|
|
\c struct {
|
|
\c char c;
|
|
\c int i;
|
|
\c } foo;
|
|
|
|
might be eight bytes long rather than five, since the \c{int} field
|
|
would be aligned to a four-byte boundary. However, this sort of
|
|
feature is sometimes a configurable option in the C compiler, either
|
|
using command-line options or \c{#pragma} lines, so you have to find
|
|
out how your own compiler does it.
|
|
|
|
|
|
\S{32cmacro} \i\c{c32.mac}: Helper Macros for the 32-bit C Interface
|
|
|
|
Included in the NASM archives, in the \I{misc directory}\c{misc}
|
|
directory, is a file \c{c32.mac} of macros. It defines three macros:
|
|
\i\c{proc}, \i\c{arg} and \i\c{endproc}. These are intended to be
|
|
used for C-style procedure definitions, and they automate a lot of
|
|
the work involved in keeping track of the calling convention.
|
|
|
|
An example of an assembly function using the macro set is given
|
|
here:
|
|
|
|
\c proc _proc32
|
|
\c
|
|
\c %$i arg
|
|
\c %$j arg
|
|
\c mov eax,[ebp + %$i]
|
|
\c mov ebx,[ebp + %$j]
|
|
\c add eax,[ebx]
|
|
\c
|
|
\c endproc
|
|
|
|
This defines \c{_proc32} to be a procedure taking two arguments, the
|
|
first (\c{i}) an integer and the second (\c{j}) a pointer to an
|
|
integer. It returns \c{i + *j}.
|
|
|
|
Note that the \c{arg} macro has an \c{EQU} as the first line of its
|
|
expansion, and since the label before the macro call gets prepended
|
|
to the first line of the expanded macro, the \c{EQU} works, defining
|
|
\c{%$i} to be an offset from \c{BP}. A context-local variable is
|
|
used, local to the context pushed by the \c{proc} macro and popped
|
|
by the \c{endproc} macro, so that the same argument name can be used
|
|
in later procedures. Of course, you don't \e{have} to do that.
|
|
|
|
\c{arg} can take an optional parameter, giving the size of the
|
|
argument. If no size is given, 4 is assumed, since it is likely that
|
|
many function parameters will be of type \c{int} or pointers.
|
|
|
|
|
|
\H{picdll} Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF \i{Shared
|
|
Libraries}
|
|
|
|
\c{ELF} replaced the older \c{a.out} object file format under Linux
|
|
because it contains support for \i{position-independent code}
|
|
(\i{PIC}), which makes writing shared libraries much easier. NASM
|
|
supports the \c{ELF} position-independent code features, so you can
|
|
write Linux \c{ELF} shared libraries in NASM.
|
|
|
|
\i{NetBSD}, and its close cousins \i{FreeBSD} and \i{OpenBSD}, take
|
|
a different approach by hacking PIC support into the \c{a.out}
|
|
format. NASM supports this as the \i\c{aoutb} output format, so you
|
|
can write \i{BSD} shared libraries in NASM too.
|
|
|
|
The operating system loads a PIC shared library by memory-mapping
|
|
the library file at an arbitrarily chosen point in the address space
|
|
of the running process. The contents of the library's code section
|
|
must therefore not depend on where it is loaded in memory.
|
|
|
|
Therefore, you cannot get at your variables by writing code like
|
|
this:
|
|
|
|
\c mov eax,[myvar] ; WRONG
|
|
|
|
Instead, the linker provides an area of memory called the
|
|
\i\e{global offset table}, or \i{GOT}; the GOT is situated at a
|
|
constant distance from your library's code, so if you can find out
|
|
where your library is loaded (which is typically done using a
|
|
\c{CALL} and \c{POP} combination), you can obtain the address of the
|
|
GOT, and you can then load the addresses of your variables out of
|
|
linker-generated entries in the GOT.
|
|
|
|
The \e{data} section of a PIC shared library does not have these
|
|
restrictions: since the data section is writable, it has to be
|
|
copied into memory anyway rather than just paged in from the library
|
|
file, so as long as it's being copied it can be relocated too. So
|
|
you can put ordinary types of relocation in the data section without
|
|
too much worry (but see \k{picglobal} for a caveat).
|
|
|
|
|
|
\S{picgot} Obtaining the Address of the GOT
|
|
|
|
Each code module in your shared library should define the GOT as an
|
|
external symbol:
|
|
|
|
\c extern _GLOBAL_OFFSET_TABLE_ ; in ELF
|
|
\c extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out
|
|
|
|
At the beginning of any function in your shared library which plans
|
|
to access your data or BSS sections, you must first calculate the
|
|
address of the GOT. This is typically done by writing the function
|
|
in this form:
|
|
|
|
\c func: push ebp
|
|
\c mov ebp,esp
|
|
\c push ebx
|
|
\c call .get_GOT
|
|
\c .get_GOT:
|
|
\c pop ebx
|
|
\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc
|
|
\c
|
|
\c ; the function body comes here
|
|
\c
|
|
\c mov ebx,[ebp-4]
|
|
\c mov esp,ebp
|
|
\c pop ebp
|
|
\c ret
|
|
|
|
(For BSD, again, the symbol \c{_GLOBAL_OFFSET_TABLE} requires a
|
|
second leading underscore.)
|
|
|
|
The first two lines of this function are simply the standard C
|
|
prologue to set up a stack frame, and the last three lines are
|
|
standard C function epilogue. The third line, and the fourth to last
|
|
line, save and restore the \c{EBX} register, because PIC shared
|
|
libraries use this register to store the address of the GOT.
|
|
|
|
The interesting bit is the \c{CALL} instruction and the following
|
|
two lines. The \c{CALL} and \c{POP} combination obtains the address
|
|
of the label \c{.get_GOT}, without having to know in advance where
|
|
the program was loaded (since the \c{CALL} instruction is encoded
|
|
relative to the current position). The \c{ADD} instruction makes use
|
|
of one of the special PIC relocation types: \i{GOTPC relocation}.
|
|
With the \i\c{WRT ..gotpc} qualifier specified, the symbol
|
|
referenced (here \c{_GLOBAL_OFFSET_TABLE_}, the special symbol
|
|
assigned to the GOT) is given as an offset from the beginning of the
|
|
section. (Actually, \c{ELF} encodes it as the offset from the operand
|
|
field of the \c{ADD} instruction, but NASM simplifies this
|
|
deliberately, so you do things the same way for both \c{ELF} and
|
|
\c{BSD}.) So the instruction then \e{adds} the beginning of the section,
|
|
to get the real address of the GOT, and subtracts the value of
|
|
\c{.get_GOT} which it knows is in \c{EBX}. Therefore, by the time
|
|
that instruction has finished, \c{EBX} contains the address of the GOT.
|
|
|
|
If you didn't follow that, don't worry: it's never necessary to
|
|
obtain the address of the GOT by any other means, so you can put
|
|
those three instructions into a macro and safely ignore them:
|
|
|
|
\c %macro get_GOT 0
|
|
\c
|
|
\c call %%getgot
|
|
\c %%getgot:
|
|
\c pop ebx
|
|
\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
|
|
\c
|
|
\c %endmacro
|
|
|
|
\S{piclocal} Finding Your Local Data Items
|
|
|
|
Having got the GOT, you can then use it to obtain the addresses of
|
|
your data items. Most variables will reside in the sections you have
|
|
declared; they can be accessed using the \I{GOTOFF
|
|
relocation}\c{..gotoff} special \I\c{WRT ..gotoff}\c{WRT} type. The
|
|
way this works is like this:
|
|
|
|
\c lea eax,[ebx+myvar wrt ..gotoff]
|
|
|
|
The expression \c{myvar wrt ..gotoff} is calculated, when the shared
|
|
library is linked, to be the offset to the local variable \c{myvar}
|
|
from the beginning of the GOT. Therefore, adding it to \c{EBX} as
|
|
above will place the real address of \c{myvar} in \c{EAX}.
|
|
|
|
If you declare variables as \c{GLOBAL} without specifying a size for
|
|
them, they are shared between code modules in the library, but do
|
|
not get exported from the library to the program that loaded it.
|
|
They will still be in your ordinary data and BSS sections, so you
|
|
can access them in the same way as local variables, using the above
|
|
\c{..gotoff} mechanism.
|
|
|
|
Note that due to a peculiarity of the way BSD \c{a.out} format
|
|
handles this relocation type, there must be at least one non-local
|
|
symbol in the same section as the address you're trying to access.
|
|
|
|
|
|
\S{picextern} Finding External and Common Data Items
|
|
|
|
If your library needs to get at an external variable (external to
|
|
the \e{library}, not just to one of the modules within it), you must
|
|
use the \I{GOT relocations}\I\c{WRT ..got}\c{..got} type to get at
|
|
it. The \c{..got} type, instead of giving you the offset from the
|
|
GOT base to the variable, gives you the offset from the GOT base to
|
|
a GOT \e{entry} containing the address of the variable. The linker
|
|
will set up this GOT entry when it builds the library, and the
|
|
dynamic linker will place the correct address in it at load time. So
|
|
to obtain the address of an external variable \c{extvar} in \c{EAX},
|
|
you would code
|
|
|
|
\c mov eax,[ebx+extvar wrt ..got]
|
|
|
|
This loads the address of \c{extvar} out of an entry in the GOT. The
|
|
linker, when it builds the shared library, collects together every
|
|
relocation of type \c{..got}, and builds the GOT so as to ensure it
|
|
has every necessary entry present.
|
|
|
|
Common variables must also be accessed in this way.
|
|
|
|
|
|
\S{picglobal} Exporting Symbols to the Library User
|
|
|
|
If you want to export symbols to the user of the library, you have
|
|
to declare whether they are functions or data, and if they are data,
|
|
you have to give the size of the data item. This is because the
|
|
dynamic linker has to build \I{PLT}\i{procedure linkage table}
|
|
entries for any exported functions, and also moves exported data
|
|
items away from the library's data section in which they were
|
|
declared.
|
|
|
|
So to export a function to users of the library, you must use
|
|
|
|
\c global func:function ; declare it as a function
|
|
\c
|
|
\c func: push ebp
|
|
\c
|
|
\c ; etc.
|
|
|
|
And to export a data item such as an array, you would have to code
|
|
|
|
\c global array:data array.end-array ; give the size too
|
|
\c
|
|
\c array: resd 128
|
|
\c .end:
|
|
|
|
Be careful: If you export a variable to the library user, by
|
|
declaring it as \c{GLOBAL} and supplying a size, the variable will
|
|
end up living in the data section of the main program, rather than
|
|
in your library's data section, where you declared it. So you will
|
|
have to access your own global variable with the \c{..got} mechanism
|
|
rather than \c{..gotoff}, as if it were external (which,
|
|
effectively, it has become).
|
|
|
|
Equally, if you need to store the address of an exported global in
|
|
one of your data sections, you can't do it by means of the standard
|
|
sort of code:
|
|
|
|
\c dataptr: dd global_data_item ; WRONG
|
|
|
|
NASM will interpret this code as an ordinary relocation, in which
|
|
\c{global_data_item} is merely an offset from the beginning of the
|
|
\c{.data} section (or whatever); so this reference will end up
|
|
pointing at your data section instead of at the exported global
|
|
which resides elsewhere.
|
|
|
|
Instead of the above code, then, you must write
|
|
|
|
\c dataptr: dd global_data_item wrt ..sym
|
|
|
|
which makes use of the special \c{WRT} type \I\c{WRT ..sym}\c{..sym}
|
|
to instruct NASM to search the symbol table for a particular symbol
|
|
at that address, rather than just relocating by section base.
|
|
|
|
Either method will work for functions: referring to one of your
|
|
functions by means of
|
|
|
|
\c funcptr: dd my_function
|
|
|
|
will give the user the address of the code you wrote, whereas
|
|
|
|
\c funcptr: dd my_function wrt ..sym
|
|
|
|
will give the address of the procedure linkage table for the
|
|
function, which is where the calling program will \e{believe} the
|
|
function lives. Either address is a valid way to call the function.
|
|
|
|
|
|
\S{picproc} Calling Procedures Outside the Library
|
|
|
|
Calling procedures outside your shared library has to be done by
|
|
means of a \i\e{procedure linkage table}, or \i{PLT}. The PLT is
|
|
placed at a known offset from where the library is loaded, so the
|
|
library code can make calls to the PLT in a position-independent
|
|
way. Within the PLT there is code to jump to offsets contained in
|
|
the GOT, so function calls to other shared libraries or to routines
|
|
in the main program can be transparently passed off to their real
|
|
destinations.
|
|
|
|
To call an external routine, you must use another special PIC
|
|
relocation type, \I{PLT relocations}\i\c{WRT ..plt}. This is much
|
|
easier than the GOT-based ones: you simply replace calls such as
|
|
\c{CALL printf} with the PLT-relative version \c{CALL printf WRT
|
|
..plt}.
|
|
|
|
|
|
\S{link} Generating the Library File
|
|
|
|
Having written some code modules and assembled them to \c{.o} files,
|
|
you then generate your shared library with a command such as
|
|
|
|
\c ld -shared -o library.so module1.o module2.o # for ELF
|
|
\c ld -Bshareable -o library.so module1.o module2.o # for BSD
|
|
|
|
For ELF, if your shared library is going to reside in system
|
|
directories such as \c{/usr/lib} or \c{/lib}, it is usually worth
|
|
using the \i\c{-soname} flag to the linker, to store the final
|
|
library file name, with a version number, into the library:
|
|
|
|
\c ld -shared -soname library.so.1 -o library.so.1.2 *.o
|
|
|
|
You would then copy \c{library.so.1.2} into the library directory,
|
|
and create \c{library.so.1} as a symbolic link to it.
|
|
|
|
|