6ad3bab7fe
Make the source code for the documentation a little easier to deal with by breaking it into individual chapter files. Add support to rdsrc.pl for auto-generating dependencies. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
1494 lines
62 KiB
Text
1494 lines
62 KiB
Text
\C{outfmt} \i{Output Formats}
|
|
|
|
NASM is a portable assembler, designed to be able to compile on any
|
|
ANSI C-supporting platform and produce output to run on a variety of
|
|
Intel x86 operating systems. For this reason, it has a large number
|
|
of available output formats, selected using the \i\c{-f} option on
|
|
the NASM \i{command line}. Each of these formats, along with its
|
|
extensions to the base NASM syntax, is detailed in this chapter.
|
|
|
|
As stated in \k{opt-o}, NASM chooses a \i{default name} for your
|
|
output file based on the input file name and the chosen output
|
|
format. This will be generated by removing the \i{extension}
|
|
(\c{.asm}, \c{.s}, or whatever you like to use) from the input file
|
|
name, and substituting an extension defined by the output format.
|
|
The extensions are given with each format below.
|
|
|
|
|
|
\H{binfmt} \i\c{bin}: \i{Flat-Form Binary}\I{pure binary} Output
|
|
|
|
The \c{bin} format does not produce object files: it generates
|
|
nothing in the output file except the code you wrote. Such `pure
|
|
binary' files are used by \i{MS-DOS}: \i\c{.COM} executables and
|
|
\i\c{.SYS} device drivers are pure binary files. Pure binary output
|
|
is also useful for \i{operating system} and \i{boot loader}
|
|
development.
|
|
|
|
The \c{bin} format supports \i{multiple section names}. For details of
|
|
how NASM handles sections in the \c{bin} format, see \k{multisec}.
|
|
|
|
Using the \c{bin} format puts NASM by default into 16-bit mode (see
|
|
\k{bits}). In order to use \c{bin} to write 32-bit or 64-bit code,
|
|
such as an OS kernel, you need to explicitly issue the \I\c{BITS}\c{BITS 32}
|
|
or \I\c{BITS}\c{BITS 64} directive.
|
|
|
|
\c{bin} has no default output file name extension: instead, it
|
|
leaves your file name as it is once the original extension has been
|
|
removed. Thus, the default is for NASM to assemble \c{binprog.asm}
|
|
into a binary file called \c{binprog}.
|
|
|
|
It is extremely important to understand that the binary output format
|
|
is simply nothing other than \e{a linker built into the NASM
|
|
executable.} As such, NASM behaves just as it does when producing any
|
|
other output format: notably the list file reflects the code output
|
|
\e{before} relocation, and the addresses in the list file are
|
|
addresses relative to the start of the current output section.
|
|
|
|
|
|
\S{org} \i\c{ORG}: Binary File \i{Program Origin}
|
|
|
|
The \c{bin} format provides an additional directive to the list
|
|
given in \k{directive}: \c{ORG}. The function of the \c{ORG}
|
|
directive is to specify the origin address which NASM will assume
|
|
the program begins at when it is loaded into memory.
|
|
|
|
For example, the following code will generate the longword
|
|
\c{0x00000104}:
|
|
|
|
\c org 0x100
|
|
\c dd label
|
|
\c label:
|
|
|
|
Unlike the \c{ORG} directive provided by MASM-compatible assemblers,
|
|
which allows you to jump around in the object file and overwrite
|
|
code you have already generated, NASM's \c{ORG} does exactly what
|
|
the directive says: \e{origin}. Its sole function is to specify one
|
|
offset which is added to all internal address references within the
|
|
section; it does not permit any of the trickery that MASM's version
|
|
does. See \k{proborg} for further comments.
|
|
|
|
|
|
\S{binseg} \c{bin} Extensions to the \c{SECTION}
|
|
Directive\I{\c{SECTION}, \c{bin} extensions to}
|
|
|
|
The \c{bin} output format extends the \c{SECTION} (or \c{SEGMENT})
|
|
directive to allow you to specify the alignment requirements of
|
|
segments. This is done by appending the \i\c{ALIGN} qualifier to the
|
|
end of the section-definition line. For example,
|
|
|
|
\c section .data align=16
|
|
|
|
switches to the section \c{.data} and also specifies that it must be
|
|
aligned on a 16-byte boundary.
|
|
|
|
The parameter to \c{ALIGN} specifies how many low bits of the
|
|
section start address must be forced to zero. The alignment value
|
|
given may be any power of two.\I{section alignment, in
|
|
bin}\I{segment alignment, in bin}\I{alignment, in bin sections}
|
|
|
|
|
|
\S{multisec} \i{Multisection}\I{bin, multisection} Support for the \c{bin} Format
|
|
|
|
The \c{bin} format allows the use of multiple sections, of arbitrary names,
|
|
besides the "known" \c{.text}, \c{.data}, and \c{.bss} names.
|
|
|
|
\b Sections may be designated \i\c{progbits} or \i\c{nobits}. Default
|
|
is \c{progbits} (except \c{.bss}, which defaults to \c{nobits},
|
|
of course).
|
|
|
|
\b Sections can be aligned at a specified boundary following the previous
|
|
section with \c{align=}, or at an arbitrary byte-granular position with
|
|
\i\c{start=}.
|
|
|
|
\b Sections can be given a virtual start address, which will be used
|
|
for the calculation of all memory references within that section
|
|
with \i\c{vstart=}.
|
|
|
|
\b Sections can be ordered using \i\c{follows=}\c{<section>} or
|
|
\i\c{vfollows=}\c{<section>} as an alternative to specifying an explicit
|
|
start address.
|
|
|
|
\b Arguments to \c{org}, \c{start}, \c{vstart}, and \c{align=} are
|
|
critical expressions. See \k{crit}. E.g. \c{align=(1 << ALIGN_SHIFT)}
|
|
- \c{ALIGN_SHIFT} must be defined before it is used here.
|
|
|
|
\b Any code which comes before an explicit \c{SECTION} directive
|
|
is directed by default into the \c{.text} section.
|
|
|
|
\b If an \c{ORG} statement is not given, \c{ORG 0} is used
|
|
by default.
|
|
|
|
\b The \c{.bss} section will be placed after the last \c{progbits}
|
|
section, unless \c{start=}, \c{vstart=}, \c{follows=}, or \c{vfollows=}
|
|
has been specified.
|
|
|
|
\b All sections are aligned on dword boundaries, unless a different
|
|
alignment has been specified.
|
|
|
|
\b Sections may not overlap.
|
|
|
|
\b NASM creates the \c{section.<secname>.start} for each section,
|
|
which may be used in your code.
|
|
|
|
\S{map}\i{Map Files}
|
|
|
|
Map files can be generated in \c{-f bin} format by means of the \c{[map]}
|
|
option. Map types of \c{all} (default), \c{brief}, \c{sections}, \c{segments},
|
|
or \c{symbols} may be specified. Output may be directed to \c{stdout}
|
|
(default), \c{stderr}, or a specified file. E.g.
|
|
\c{[map symbols myfile.map]}. No "user form" exists, the square
|
|
brackets must be used.
|
|
|
|
|
|
\H{ithfmt} \i\c{ith}: \i{Intel Hex} Output
|
|
|
|
The \c{ith} file format produces Intel hex-format files. Just as the
|
|
\c{bin} format, this is a flat memory image format with no support for
|
|
further relocation or linking. It is usually used with ROM
|
|
programmers and similar utilities.
|
|
|
|
From a programmer point of view, this behaves identically to the
|
|
\c{.bin} format; the only difference is the encoding of the
|
|
output. All extensions supported by the \c{bin} file format is also
|
|
supported by the \c{ith} file format.
|
|
|
|
\c{ith} provides a default output file-name extension of \c{.ith}.
|
|
|
|
|
|
\H{srecfmt} \i\c{srec}: \i{Motorola S-Records} Output
|
|
|
|
The \c{srec} file format produces Motorola S-records files. Just as the
|
|
\c{bin} format, this is a flat memory image format with no support for
|
|
relocation or linking. It is usually used with ROM programmers and
|
|
similar utilities.
|
|
|
|
From a programmer point of view, this behaves identically to the
|
|
\c{.bin} format; the only difference is the encoding of the
|
|
output. All extensions supported by the \c{bin} file format is also
|
|
supported by the \c{srec} file format.
|
|
|
|
\c{srec} provides a default output file-name extension of \c{.srec}.
|
|
|
|
|
|
\H{objfmt} \i\c{obj}: \i{Microsoft OMF}\I{OMF} Object Files
|
|
|
|
The \c{obj} file format (NASM calls it \c{obj} rather than \c{omf}
|
|
for historical reasons) is the one produced by \i{MASM} and
|
|
\i{TASM}, which is typically fed to 16-bit DOS linkers to produce
|
|
\i\c{.EXE} files. It is also the format used by \i{OS/2}.
|
|
|
|
\c{obj} provides a default output file-name extension of \c{.obj}.
|
|
|
|
\c{obj} is not exclusively a 16-bit format, though: NASM has full
|
|
support for the 32-bit extensions to the format. In particular,
|
|
32-bit \c{obj} format files are used by \i{Borland's Win32
|
|
compilers}, instead of using Microsoft's newer \i\c{win32} object
|
|
file format.
|
|
|
|
The \c{obj} format does not define any special segment names: you
|
|
can call your segments anything you like. Typical names for segments
|
|
in \c{obj} format files are \c{CODE}, \c{DATA} and \c{BSS}.
|
|
|
|
If your source file contains code before specifying an explicit
|
|
\c{SEGMENT} directive, then NASM will invent its own segment called
|
|
\i\c{__NASMDEFSEG} for you.
|
|
|
|
When you define a segment in an \c{obj} file, NASM defines the
|
|
segment name as a symbol as well, so that you can access the segment
|
|
address of the segment. So, for example:
|
|
|
|
\c segment data
|
|
\c
|
|
\c dvar: dw 1234
|
|
\c
|
|
\c segment code
|
|
\c
|
|
\c function:
|
|
\c mov ax,data ; get segment address of data
|
|
\c mov ds,ax ; and move it into DS
|
|
\c inc word [dvar] ; now this reference will work
|
|
\c ret
|
|
|
|
The \c{obj} format also enables the use of the \i\c{SEG} and
|
|
\i\c{WRT} operators, so that you can write code which does things
|
|
like
|
|
|
|
\c extern foo
|
|
\c
|
|
\c mov ax,seg foo ; get preferred segment of foo
|
|
\c mov ds,ax
|
|
\c mov ax,data ; a different segment
|
|
\c mov es,ax
|
|
\c mov ax,[ds:foo] ; this accesses `foo'
|
|
\c mov [es:foo wrt data],bx ; so does this
|
|
|
|
|
|
\S{objseg} \c{obj} Extensions to the \c{SEGMENT}
|
|
Directive\I{SEGMENT, obj extensions to}
|
|
|
|
The \c{obj} output format extends the \c{SEGMENT} (or \c{SECTION})
|
|
directive to allow you to specify various properties of the segment
|
|
you are defining. This is done by appending extra qualifiers to the
|
|
end of the segment-definition line. For example,
|
|
|
|
\c segment code private align=16
|
|
|
|
defines the segment \c{code}, but also declares it to be a private
|
|
segment, and requires that the portion of it described in this code
|
|
module must be aligned on a 16-byte boundary.
|
|
|
|
The available qualifiers are:
|
|
|
|
\b \i\c{PRIVATE}, \i\c{PUBLIC}, \i\c{COMMON} and \i\c{STACK} specify
|
|
the combination characteristics of the segment. \c{PRIVATE} segments
|
|
do not get combined with any others by the linker; \c{PUBLIC} and
|
|
\c{STACK} segments get concatenated together at link time; and
|
|
\c{COMMON} segments all get overlaid on top of each other rather
|
|
than stuck end-to-end.
|
|
|
|
\b \i\c{ALIGN} is used, as shown above, to specify how many low bits
|
|
of the segment start address must be forced to zero. The alignment
|
|
value given may be any power of two from 1 to 4096; in reality, the
|
|
only values supported are 1, 2, 4, 16, 256 and 4096, so if 8 is
|
|
specified it will be rounded up to 16, and 32, 64 and 128 will all
|
|
be rounded up to 256, and so on. Note that alignment to 4096-byte
|
|
boundaries is a \i{PharLap} extension to the format and may not be
|
|
supported by all linkers.\I{section alignment, in OBJ}\I{segment
|
|
alignment, in OBJ}\I{alignment, in OBJ sections}
|
|
|
|
\b \i\c{CLASS} can be used to specify the segment class; this feature
|
|
indicates to the linker that segments of the same class should be
|
|
placed near each other in the output file. The class name can be any
|
|
word, e.g. \c{CLASS=CODE}.
|
|
|
|
\b \i\c{OVERLAY}, like \c{CLASS}, is specified with an arbitrary word
|
|
as an argument, and provides overlay information to an
|
|
overlay-capable linker.
|
|
|
|
\b Segments can be declared as \i\c{USE16} or \i\c{USE32}, which has
|
|
the effect of recording the choice in the object file and also
|
|
ensuring that NASM's default assembly mode when assembling in that
|
|
segment is 16-bit or 32-bit respectively.
|
|
|
|
\b When writing \i{OS/2} object files, you should declare 32-bit
|
|
segments as \i\c{FLAT}, which causes the default segment base for
|
|
anything in the segment to be the special group \c{FLAT}, and also
|
|
defines the group if it is not already defined.
|
|
|
|
\b The \c{obj} file format also allows segments to be declared as
|
|
having a pre-defined absolute segment address, although no linkers
|
|
are currently known to make sensible use of this feature;
|
|
nevertheless, NASM allows you to declare a segment such as
|
|
\c{SEGMENT SCREEN ABSOLUTE=0xB800} if you need to. The \i\c{ABSOLUTE}
|
|
and \c{ALIGN} keywords are mutually exclusive.
|
|
|
|
NASM's default segment attributes are \c{PUBLIC}, \c{ALIGN=1}, no
|
|
class, no overlay, and \c{USE16}.
|
|
|
|
|
|
\S{group} \i\c{GROUP}: Defining Groups of Segments\I{segments, groups of}
|
|
|
|
The \c{obj} format also allows segments to be grouped, so that a
|
|
single segment register can be used to refer to all the segments in
|
|
a group. NASM therefore supplies the \c{GROUP} directive, whereby
|
|
you can code
|
|
|
|
\c segment data
|
|
\c
|
|
\c ; some data
|
|
\c
|
|
\c segment bss
|
|
\c
|
|
\c ; some uninitialized data
|
|
\c
|
|
\c group dgroup data bss
|
|
|
|
which will define a group called \c{dgroup} to contain the segments
|
|
\c{data} and \c{bss}. Like \c{SEGMENT}, \c{GROUP} causes the group
|
|
name to be defined as a symbol, so that you can refer to a variable
|
|
\c{var} in the \c{data} segment as \c{var wrt data} or as \c{var wrt
|
|
dgroup}, depending on which segment value is currently in your
|
|
segment register.
|
|
|
|
If you just refer to \c{var}, however, and \c{var} is declared in a
|
|
segment which is part of a group, then NASM will default to giving
|
|
you the offset of \c{var} from the beginning of the \e{group}, not
|
|
the \e{segment}. Therefore \c{SEG var}, also, will return the group
|
|
base rather than the segment base.
|
|
|
|
NASM will allow a segment to be part of more than one group, but
|
|
will generate a warning if you do this. Variables declared in a
|
|
segment which is part of more than one group will default to being
|
|
relative to the first group that was defined to contain the segment.
|
|
|
|
A group does not have to contain any segments; you can still make
|
|
\c{WRT} references to a group which does not contain the variable
|
|
you are referring to. OS/2, for example, defines the special group
|
|
\c{FLAT} with no segments in it.
|
|
|
|
|
|
\S{uppercase} \i\c{UPPERCASE}: Disabling Case Sensitivity in Output
|
|
|
|
Although NASM itself is \i{case sensitive}, some OMF linkers are
|
|
not; therefore it can be useful for NASM to output single-case
|
|
object files. The \c{UPPERCASE} format-specific directive causes all
|
|
segment, group and symbol names that are written to the object file
|
|
to be forced to upper case just before being written. Within a
|
|
source file, NASM is still case-sensitive; but the object file can
|
|
be written entirely in upper case if desired.
|
|
|
|
\c{UPPERCASE} is used alone on a line; it requires no parameters.
|
|
|
|
|
|
\S{import} \i\c{IMPORT}: Importing DLL Symbols\I{DLL symbols,
|
|
importing}\I{symbols, importing from DLLs}
|
|
|
|
The \c{IMPORT} format-specific directive defines a symbol to be
|
|
imported from a DLL, for use if you are writing a DLL's \i{import
|
|
library} in NASM. You still need to declare the symbol as \c{EXTERN}
|
|
as well as using the \c{IMPORT} directive.
|
|
|
|
The \c{IMPORT} directive takes two required parameters, separated by
|
|
white space, which are (respectively) the name of the symbol you
|
|
wish to import and the name of the library you wish to import it
|
|
from. For example:
|
|
|
|
\c import WSAStartup wsock32.dll
|
|
|
|
A third optional parameter gives the name by which the symbol is
|
|
known in the library you are importing it from, in case this is not
|
|
the same as the name you wish the symbol to be known by to your code
|
|
once you have imported it. For example:
|
|
|
|
\c import asyncsel wsock32.dll WSAAsyncSelect
|
|
|
|
|
|
\S{export} \i\c{EXPORT}: Exporting DLL Symbols\I{DLL symbols,
|
|
exporting}\I{symbols, exporting from DLLs}
|
|
|
|
The \c{EXPORT} format-specific directive defines a global symbol to
|
|
be exported as a DLL symbol, for use if you are writing a DLL in
|
|
NASM. You still need to declare the symbol as \c{GLOBAL} as well as
|
|
using the \c{EXPORT} directive.
|
|
|
|
\c{EXPORT} takes one required parameter, which is the name of the
|
|
symbol you wish to export, as it was defined in your source file. An
|
|
optional second parameter (separated by white space from the first)
|
|
gives the \e{external} name of the symbol: the name by which you
|
|
wish the symbol to be known to programs using the DLL. If this name
|
|
is the same as the internal name, you may leave the second parameter
|
|
off.
|
|
|
|
Further parameters can be given to define attributes of the exported
|
|
symbol. These parameters, like the second, are separated by white
|
|
space. If further parameters are given, the external name must also
|
|
be specified, even if it is the same as the internal name. The
|
|
available attributes are:
|
|
|
|
\b \c{resident} indicates that the exported name is to be kept
|
|
resident by the system loader. This is an optimization for
|
|
frequently used symbols imported by name.
|
|
|
|
\b \c{nodata} indicates that the exported symbol is a function which
|
|
does not make use of any initialized data.
|
|
|
|
\b \c{parm=NNN}, where \c{NNN} is an integer, sets the number of
|
|
parameter words for the case in which the symbol is a call gate
|
|
between 32-bit and 16-bit segments.
|
|
|
|
\b An attribute which is just a number indicates that the symbol
|
|
should be exported with an identifying number (ordinal), and gives
|
|
the desired number.
|
|
|
|
For example:
|
|
|
|
\c export myfunc
|
|
\c export myfunc TheRealMoreFormalLookingFunctionName
|
|
\c export myfunc myfunc 1234 ; export by ordinal
|
|
\c export myfunc myfunc resident parm=23 nodata
|
|
|
|
|
|
\S{dotdotstart} \i\c{..start}: Defining the \i{Program Entry
|
|
Point}
|
|
|
|
\c{OMF} linkers require exactly one of the object files being linked to
|
|
define the program entry point, where execution will begin when the
|
|
program is run. If the object file that defines the entry point is
|
|
assembled using NASM, you specify the entry point by declaring the
|
|
special symbol \c{..start} at the point where you wish execution to
|
|
begin.
|
|
|
|
|
|
\S{objextern} \c{obj} Extensions to the \c{EXTERN}
|
|
Directive\I{EXTERN, obj extensions to}
|
|
|
|
If you declare an external symbol with the directive
|
|
|
|
\c extern foo
|
|
|
|
then references such as \c{mov ax,foo} will give you the offset of
|
|
\c{foo} from its preferred segment base (as specified in whichever
|
|
module \c{foo} is actually defined in). So to access the contents of
|
|
\c{foo} you will usually need to do something like
|
|
|
|
\c mov ax,seg foo ; get preferred segment base
|
|
\c mov es,ax ; move it into ES
|
|
\c mov ax,[es:foo] ; and use offset `foo' from it
|
|
|
|
This is a little unwieldy, particularly if you know that an external
|
|
is going to be accessible from a given segment or group, say
|
|
\c{dgroup}. So if \c{DS} already contained \c{dgroup}, you could
|
|
simply code
|
|
|
|
\c mov ax,[foo wrt dgroup]
|
|
|
|
However, having to type this every time you want to access \c{foo}
|
|
can be a pain; so NASM allows you to declare \c{foo} in the
|
|
alternative form
|
|
|
|
\c extern foo:wrt dgroup
|
|
|
|
This form causes NASM to pretend that the preferred segment base of
|
|
\c{foo} is in fact \c{dgroup}; so the expression \c{seg foo} will
|
|
now return \c{dgroup}, and the expression \c{foo} is equivalent to
|
|
\c{foo wrt dgroup}.
|
|
|
|
This \I{default-WRT mechanism}default-\c{WRT} mechanism can be used
|
|
to make externals appear to be relative to any group or segment in
|
|
your program. It can also be applied to common variables: see
|
|
\k{objcommon}.
|
|
|
|
|
|
\S{objcommon} \c{obj} Extensions to the \c{COMMON}
|
|
Directive\I{COMMON, obj extensions to}
|
|
|
|
The \c{obj} format allows common variables to be either near\I{near
|
|
common variables} or far\I{far common variables}; NASM allows you to
|
|
specify which your variables should be by the use of the syntax
|
|
|
|
\c common nearvar 2:near ; `nearvar' is a near common
|
|
\c common farvar 10:far ; and `farvar' is far
|
|
|
|
Far common variables may be greater in size than 64Kb, and so the
|
|
OMF specification says that they are declared as a number of
|
|
\e{elements} of a given size. So a 10-byte far common variable could
|
|
be declared as ten one-byte elements, five two-byte elements, two
|
|
five-byte elements or one ten-byte element.
|
|
|
|
Some \c{OMF} linkers require the \I{element size, in common
|
|
variables}\I{common variables, element size}element size, as well as
|
|
the variable size, to match when resolving common variables declared
|
|
in more than one module. Therefore NASM must allow you to specify
|
|
the element size on your far common variables. This is done by the
|
|
following syntax:
|
|
|
|
\c common c_5by2 10:far 5 ; two five-byte elements
|
|
\c common c_2by5 10:far 2 ; five two-byte elements
|
|
|
|
If no element size is specified, the default is 1. Also, the \c{FAR}
|
|
keyword is not required when an element size is specified, since
|
|
only far commons may have element sizes at all. So the above
|
|
declarations could equivalently be
|
|
|
|
\c common c_5by2 10:5 ; two five-byte elements
|
|
\c common c_2by5 10:2 ; five two-byte elements
|
|
|
|
In addition to these extensions, the \c{COMMON} directive in \c{obj}
|
|
also supports default-\c{WRT} specification like \c{EXTERN} does
|
|
(explained in \k{objextern}). So you can also declare things like
|
|
|
|
\c common foo 10:wrt dgroup
|
|
\c common bar 16:far 2:wrt data
|
|
\c common baz 24:wrt data:6
|
|
|
|
|
|
\S{objdepend} Embedded File Dependency Information
|
|
|
|
Since NASM 2.13.02, \c{obj} files contain embedded dependency file
|
|
information. To suppress the generation of dependencies, use
|
|
|
|
\c %pragma obj nodepend
|
|
|
|
|
|
\H{win32fmt} \i\c{win32}: Microsoft Win32 Object Files
|
|
|
|
The \c{win32} output format generates Microsoft Win32 object files,
|
|
suitable for passing to Microsoft linkers such as \i{Visual C++}.
|
|
Note that Borland Win32 compilers do not use this format, but use
|
|
\c{obj} instead (see \k{objfmt}).
|
|
|
|
\c{win32} provides a default output file-name extension of \c{.obj}.
|
|
|
|
Note that although Microsoft say that Win32 object files follow the
|
|
\c{COFF} (Common Object File Format) standard, the object files produced
|
|
by Microsoft Win32 compilers are not compatible with COFF linkers
|
|
such as DJGPP's, and vice versa. This is due to a difference of
|
|
opinion over the precise semantics of PC-relative relocations. To
|
|
produce COFF files suitable for DJGPP, use NASM's \c{coff} output
|
|
format; conversely, the \c{coff} format does not produce object
|
|
files that Win32 linkers can generate correct output from.
|
|
|
|
|
|
\S{win32sect} \c{win32} Extensions to the \c{SECTION}
|
|
Directive\I{SECTION, Windows extensions to}
|
|
|
|
Like the \c{obj} format, \c{win32} allows you to specify additional
|
|
information on the \c{SECTION} directive line, to control the type
|
|
and properties of sections you declare. Section types and properties
|
|
are generated automatically by NASM for the \i{standard section names}
|
|
\c{.text}, \c{.data} and \c{.bss}, but may still be overridden by
|
|
these qualifiers.
|
|
|
|
The available qualifiers are:
|
|
|
|
\b \c{code}, or equivalently \c{text}, defines the section to be a
|
|
code section. This marks the section as readable and executable, but
|
|
not writable, and also indicates to the linker that the type of the
|
|
section is code.
|
|
|
|
\b \c{data} and \c{bss} define the section to be a data section,
|
|
analogously to \c{code}. Data sections are marked as readable and
|
|
writable, but not executable. \c{data} declares an initialized data
|
|
section, whereas \c{bss} declares an uninitialized data section.
|
|
|
|
\b \c{rdata} declares an initialized data section that is readable
|
|
but not writable. Microsoft compilers use this section to place
|
|
constants in it.
|
|
|
|
\b \c{info} defines the section to be an \i{informational section},
|
|
which is not included in the executable file by the linker, but may
|
|
(for example) pass information \e{to} the linker. For example,
|
|
declaring an \c{info}-type section called \i\c{.drectve} causes the
|
|
linker to interpret the contents of the section as command-line
|
|
options.
|
|
|
|
\b \c{align=}, used with a trailing number as in \c{obj}, gives the
|
|
\I{section alignment, in win32}\I{alignment, in win32
|
|
sections}alignment requirements of the section. The maximum you may
|
|
specify is 64: the Win32 object file format contains no means to
|
|
request a greater section alignment than this. If alignment is not
|
|
explicitly specified, the defaults are 16-byte alignment for code
|
|
sections, 8-byte alignment for rdata sections and 4-byte alignment
|
|
for data (and BSS) sections.
|
|
Informational sections get a default alignment of 1 byte (no
|
|
alignment), though the value does not matter.
|
|
|
|
\b \I{comdat, win32 attribute}\c{comdat=}, followed by a number
|
|
("selection"), colon (acting as a separator) and a name,
|
|
marks the section as a \I{COMDAT section, in win32}"COMDAT section".
|
|
It allows Microsoft linkers to perform function-level linking,
|
|
to deal with multiply defined symbols, to eliminate dead code/data.
|
|
The "selection" number should be one of the
|
|
\c{IMAGE_COMDAT_SELECT_*} constants from
|
|
\W{https://github.com/MicrosoftDocs/win32/blob/docs/desktop-src/Debug/pe-format.md#comdat-sections-object-only}\c{COFF format specification};
|
|
this value controls if the linker allows multiply defined symbols
|
|
and how it handles them.
|
|
The name is the \I{COMDAT symbol, in win32}"COMDAT symbol"
|
|
- basically a new name for the section. So even though you have one
|
|
section given by the main name (e.g. \c{.text}), it can actually
|
|
consist of hundreds of COMDAT sections having their own name
|
|
(and alignment).
|
|
When the "selection" is IMAGE_COMDAT_SELECT_ASSOCIATIVE (5),
|
|
the following name is the "COMDAT symbol" of the associated COMDAT
|
|
section; this way you can link a piece of code or data only when
|
|
another piece of code or data gets actually linked.
|
|
|
|
\> So, when linking a NASM-compiled file with some C code,
|
|
the source may be structured as follows.
|
|
Note that the default \c{.text} section in handled in a special
|
|
way and it doesn't work well with \c{comdat}; you may want to append
|
|
a \c{$} character and an arbitrary suffix to the section name.
|
|
It will get linked into the \c{.text} section anyway - see the info on
|
|
\W{https://github.com/MicrosoftDocs/win32/blob/docs/desktop-src/Debug/pe-format.md#grouped-sections-object-only}\c{Grouped Sections}.
|
|
|
|
\c section .text$1 align=16 comdat=1:FirstFnc
|
|
\c ... ; Code linked only if referenced from C
|
|
\c
|
|
\c section .text$1 align=16 comdat=1:SecondFnc
|
|
\c ... ; Code linked only if referenced from C
|
|
\c
|
|
\c section .rdata align=32 comdat=5:FirstFnc
|
|
\c ... ; Data linked only if the related code (FirstFnc) is linked
|
|
\c
|
|
|
|
The defaults assumed by NASM if you do not specify the above
|
|
qualifiers are:
|
|
|
|
\c section .text code align=16
|
|
\c section .data data align=4
|
|
\c section .rdata rdata align=8
|
|
\c section .bss bss align=4
|
|
|
|
The \c{win64} format also adds:
|
|
|
|
\c section .pdata rdata align=4
|
|
\c section .xdata rdata align=8
|
|
|
|
Any other section name is treated by default like \c{.text}.
|
|
|
|
\S{win32safeseh} \c{win32}: Safe Structured Exception Handling
|
|
|
|
Among other improvements in Windows XP SP2 and Windows Server 2003
|
|
Microsoft has introduced concept of "safe structured exception
|
|
handling." General idea is to collect handlers' entry points in
|
|
designated read-only table and have alleged entry point verified
|
|
against this table prior exception control is passed to the handler. In
|
|
order for an executable module to be equipped with such "safe exception
|
|
handler table," all object modules on linker command line has to comply
|
|
with certain criteria. If one single module among them does not, then
|
|
the table in question is omitted and above mentioned run-time checks
|
|
will not be performed for application in question. Table omission is by
|
|
default silent and therefore can be easily overlooked. One can instruct
|
|
linker to refuse to produce binary without such table by passing
|
|
\c{/safeseh} command line option.
|
|
|
|
Without regard to this run-time check merits it's natural to expect
|
|
NASM to be capable of generating modules suitable for \c{/safeseh}
|
|
linking. From developer's viewpoint the problem is two-fold:
|
|
|
|
\b how to adapt modules not deploying exception handlers of their own;
|
|
|
|
\b how to adapt/develop modules utilizing custom exception handling;
|
|
|
|
Former can be easily achieved with any NASM version by adding following
|
|
line to source code:
|
|
|
|
\c $@feat.00 equ 1
|
|
|
|
As of version 2.03 NASM adds this absolute symbol automatically. If
|
|
it's not already present to be precise. I.e. if for whatever reason
|
|
developer would choose to assign another value in source file, it would
|
|
still be perfectly possible.
|
|
|
|
Registering custom exception handler on the other hand requires certain
|
|
"magic." As of version 2.03 additional directive is implemented,
|
|
\c{safeseh}, which instructs the assembler to produce appropriately
|
|
formatted input data for above mentioned "safe exception handler
|
|
table." Its typical use would be:
|
|
|
|
\c section .text
|
|
\c extern _MessageBoxA@16
|
|
\c %if __?NASM_VERSION_ID?__ >= 0x02030000
|
|
\c safeseh handler ; register handler as "safe handler"
|
|
\c %endif
|
|
\c handler:
|
|
\c push DWORD 1 ; MB_OKCANCEL
|
|
\c push DWORD caption
|
|
\c push DWORD text
|
|
\c push DWORD 0
|
|
\c call _MessageBoxA@16
|
|
\c sub eax,1 ; incidentally suits as return value
|
|
\c ; for exception handler
|
|
\c ret
|
|
\c global _main
|
|
\c _main:
|
|
\c push DWORD handler
|
|
\c push DWORD [fs:0]
|
|
\c mov DWORD [fs:0],esp ; engage exception handler
|
|
\c xor eax,eax
|
|
\c mov eax,DWORD[eax] ; cause exception
|
|
\c pop DWORD [fs:0] ; disengage exception handler
|
|
\c add esp,4
|
|
\c ret
|
|
\c text: db 'OK to rethrow, CANCEL to generate core dump',0
|
|
\c caption:db 'SEGV',0
|
|
\c
|
|
\c section .drectve info
|
|
\c db '/defaultlib:user32.lib /defaultlib:msvcrt.lib '
|
|
|
|
As you might imagine, it's perfectly possible to produce .exe binary
|
|
with "safe exception handler table" and yet engage unregistered
|
|
exception handler. Indeed, handler is engaged by simply manipulating
|
|
\c{[fs:0]} location at run-time, something linker has no power over,
|
|
run-time that is. It should be explicitly mentioned that such failure
|
|
to register handler's entry point with \c{safeseh} directive has
|
|
undesired side effect at run-time. If exception is raised and
|
|
unregistered handler is to be executed, the application is abruptly
|
|
terminated without any notification whatsoever. One can argue that
|
|
system could at least have logged some kind "non-safe exception
|
|
handler in x.exe at address n" message in event log, but no, literally
|
|
no notification is provided and user is left with no clue on what
|
|
caused application failure.
|
|
|
|
Finally, all mentions of linker in this paragraph refer to Microsoft
|
|
linker version 7.x and later. Presence of \c{@feat.00} symbol and input
|
|
data for "safe exception handler table" causes no backward
|
|
incompatibilities and "safeseh" modules generated by NASM 2.03 and
|
|
later can still be linked by earlier versions or non-Microsoft linkers.
|
|
|
|
\S{codeview} Debugging formats for Windows
|
|
\I{Windows debugging formats}
|
|
|
|
The \c{win32} and \c{win64} formats support the Microsoft \i{CodeView
|
|
debugging format}. Currently CodeView version 8 format is supported
|
|
(\i\c{cv8}), but newer versions of the CodeView debugger should be
|
|
able to handle this format as well.
|
|
|
|
|
|
\H{win64fmt} \i\c{win64}: Microsoft Win64 Object Files
|
|
|
|
The \c{win64} output format generates Microsoft Win64 object files,
|
|
which is nearly 100% identical to the \c{win32} object format (\k{win32fmt})
|
|
with the exception that it is meant to target 64-bit code and the x86-64
|
|
platform altogether. This object file is used exactly the same as the \c{win32}
|
|
object format (\k{win32fmt}), in NASM, with regard to this exception.
|
|
|
|
\S{win64pic} \c{win64}: Writing Position-Independent Code
|
|
|
|
While \c{REL} takes good care of RIP-relative addressing, there is one
|
|
aspect that is easy to overlook for a Win64 programmer: indirect
|
|
references. Consider a switch dispatch table:
|
|
|
|
\c jmp qword [dsptch+rax*8]
|
|
\c ...
|
|
\c dsptch: dq case0
|
|
\c dq case1
|
|
\c ...
|
|
|
|
Even a novice Win64 assembler programmer will soon realize that the code
|
|
is not 64-bit savvy. Most notably linker will refuse to link it with
|
|
|
|
\c 'ADDR32' relocation to '.text' invalid without /LARGEADDRESSAWARE:NO
|
|
|
|
So [s]he will have to split jmp instruction as following:
|
|
|
|
\c lea rbx,[rel dsptch]
|
|
\c jmp qword [rbx+rax*8]
|
|
|
|
What happens behind the scene is that effective address in \c{lea} is
|
|
encoded relative to instruction pointer, or in perfectly
|
|
position-independent manner. But this is only part of the problem!
|
|
Trouble is that in .dll context \c{caseN} relocations will make their
|
|
way to the final module and might have to be adjusted at .dll load
|
|
time. To be specific when it can't be loaded at preferred address. And
|
|
when this occurs, pages with such relocations will be rendered private
|
|
to current process, which kind of undermines the idea of sharing .dll.
|
|
But no worry, it's trivial to fix:
|
|
|
|
\c lea rbx,[rel dsptch]
|
|
\c add rbx,[rbx+rax*8]
|
|
\c jmp rbx
|
|
\c ...
|
|
\c dsptch: dq case0-dsptch
|
|
\c dq case1-dsptch
|
|
\c ...
|
|
|
|
NASM version 2.03 and later provides another alternative, \c{wrt
|
|
..imagebase} operator, which returns offset from base address of the
|
|
current image, be it .exe or .dll module, therefore the name. For those
|
|
acquainted with PE-COFF format base address denotes start of
|
|
\c{IMAGE_DOS_HEADER} structure. Here is how to implement switch with
|
|
these image-relative references:
|
|
|
|
\c lea rbx,[rel dsptch]
|
|
\c mov eax,[rbx+rax*4]
|
|
\c sub rbx,dsptch wrt ..imagebase
|
|
\c add rbx,rax
|
|
\c jmp rbx
|
|
\c ...
|
|
\c dsptch: dd case0 wrt ..imagebase
|
|
\c dd case1 wrt ..imagebase
|
|
|
|
One can argue that the operator is redundant. Indeed, snippet before
|
|
last works just fine with any NASM version and is not even Windows
|
|
specific... The real reason for implementing \c{wrt ..imagebase} will
|
|
become apparent in next paragraph.
|
|
|
|
It should be noted that \c{wrt ..imagebase} is defined as 32-bit
|
|
operand only:
|
|
|
|
\c dd label wrt ..imagebase ; ok
|
|
\c dq label wrt ..imagebase ; bad
|
|
\c mov eax,label wrt ..imagebase ; ok
|
|
\c mov rax,label wrt ..imagebase ; bad
|
|
|
|
\S{win64seh} \c{win64}: Structured Exception Handling
|
|
|
|
Structured exception handing in Win64 is completely different matter
|
|
from Win32. Upon exception program counter value is noted, and
|
|
linker-generated table comprising start and end addresses of all the
|
|
functions [in given executable module] is traversed and compared to the
|
|
saved program counter. Thus so called \c{UNWIND_INFO} structure is
|
|
identified. If it's not found, then offending subroutine is assumed to
|
|
be "leaf" and just mentioned lookup procedure is attempted for its
|
|
caller. In Win64 leaf function is such function that does not call any
|
|
other function \e{nor} modifies any Win64 non-volatile registers,
|
|
including stack pointer. The latter ensures that it's possible to
|
|
identify leaf function's caller by simply pulling the value from the
|
|
top of the stack.
|
|
|
|
While majority of subroutines written in assembler are not calling any
|
|
other function, requirement for non-volatile registers' immutability
|
|
leaves developer with not more than 7 registers and no stack frame,
|
|
which is not necessarily what [s]he counted with. Customarily one would
|
|
meet the requirement by saving non-volatile registers on stack and
|
|
restoring them upon return, so what can go wrong? If [and only if] an
|
|
exception is raised at run-time and no \c{UNWIND_INFO} structure is
|
|
associated with such "leaf" function, the stack unwind procedure will
|
|
expect to find caller's return address on the top of stack immediately
|
|
followed by its frame. Given that developer pushed caller's
|
|
non-volatile registers on stack, would the value on top point at some
|
|
code segment or even addressable space? Well, developer can attempt
|
|
copying caller's return address to the top of stack and this would
|
|
actually work in some very specific circumstances. But unless developer
|
|
can guarantee that these circumstances are always met, it's more
|
|
appropriate to assume worst case scenario, i.e. stack unwind procedure
|
|
going berserk. Relevant question is what happens then? Application is
|
|
abruptly terminated without any notification whatsoever. Just like in
|
|
Win32 case, one can argue that system could at least have logged
|
|
"unwind procedure went berserk in x.exe at address n" in event log, but
|
|
no, no trace of failure is left.
|
|
|
|
Now, when we understand significance of the \c{UNWIND_INFO} structure,
|
|
let's discuss what's in it and/or how it's processed. First of all it
|
|
is checked for presence of reference to custom language-specific
|
|
exception handler. If there is one, then it's invoked. Depending on the
|
|
return value, execution flow is resumed (exception is said to be
|
|
"handled"), \e{or} rest of \c{UNWIND_INFO} structure is processed as
|
|
following. Beside optional reference to custom handler, it carries
|
|
information about current callee's stack frame and where non-volatile
|
|
registers are saved. Information is detailed enough to be able to
|
|
reconstruct contents of caller's non-volatile registers upon call to
|
|
current callee. And so caller's context is reconstructed, and then
|
|
unwind procedure is repeated, i.e. another \c{UNWIND_INFO} structure is
|
|
associated, this time, with caller's instruction pointer, which is then
|
|
checked for presence of reference to language-specific handler, etc.
|
|
The procedure is recursively repeated till exception is handled. As
|
|
last resort system "handles" it by generating memory core dump and
|
|
terminating the application.
|
|
|
|
As for the moment of this writing NASM unfortunately does not
|
|
facilitate generation of above mentioned detailed information about
|
|
stack frame layout. But as of version 2.03 it implements building
|
|
blocks for generating structures involved in stack unwinding. As
|
|
simplest example, here is how to deploy custom exception handler for
|
|
leaf function:
|
|
|
|
\c default rel
|
|
\c section .text
|
|
\c extern MessageBoxA
|
|
\c handler:
|
|
\c sub rsp,40
|
|
\c mov rcx,0
|
|
\c lea rdx,[text]
|
|
\c lea r8,[caption]
|
|
\c mov r9,1 ; MB_OKCANCEL
|
|
\c call MessageBoxA
|
|
\c sub eax,1 ; incidentally suits as return value
|
|
\c ; for exception handler
|
|
\c add rsp,40
|
|
\c ret
|
|
\c global main
|
|
\c main:
|
|
\c xor rax,rax
|
|
\c mov rax,QWORD[rax] ; cause exception
|
|
\c ret
|
|
\c main_end:
|
|
\c text: db 'OK to rethrow, CANCEL to generate core dump',0
|
|
\c caption:db 'SEGV',0
|
|
\c
|
|
\c section .pdata rdata align=4
|
|
\c dd main wrt ..imagebase
|
|
\c dd main_end wrt ..imagebase
|
|
\c dd xmain wrt ..imagebase
|
|
\c section .xdata rdata align=8
|
|
\c xmain: db 9,0,0,0
|
|
\c dd handler wrt ..imagebase
|
|
\c section .drectve info
|
|
\c db '/defaultlib:user32.lib /defaultlib:msvcrt.lib '
|
|
|
|
What you see in \c{.pdata} section is element of the "table comprising
|
|
start and end addresses of function" along with reference to associated
|
|
\c{UNWIND_INFO} structure. And what you see in \c{.xdata} section is
|
|
\c{UNWIND_INFO} structure describing function with no frame, but with
|
|
designated exception handler. References are \e{required} to be
|
|
image-relative (which is the real reason for implementing \c{wrt
|
|
..imagebase} operator). It should be noted that \c{rdata align=n}, as
|
|
well as \c{wrt ..imagebase}, are optional in these two segments'
|
|
contexts, i.e. can be omitted. Latter means that \e{all} 32-bit
|
|
references, not only above listed required ones, placed into these two
|
|
segments turn out image-relative. Why is it important to understand?
|
|
Developer is allowed to append handler-specific data to \c{UNWIND_INFO}
|
|
structure, and if [s]he adds a 32-bit reference, then [s]he will have
|
|
to remember to adjust its value to obtain the real pointer.
|
|
|
|
As already mentioned, in Win64 terms leaf function is one that does not
|
|
call any other function \e{nor} modifies any non-volatile register,
|
|
including stack pointer. But it's not uncommon that assembler
|
|
programmer plans to utilize every single register and sometimes even
|
|
have variable stack frame. Is there anything one can do with bare
|
|
building blocks? I.e. besides manually composing fully-fledged
|
|
\c{UNWIND_INFO} structure, which would surely be considered
|
|
error-prone? Yes, there is. Recall that exception handler is called
|
|
first, before stack layout is analyzed. As it turned out, it's
|
|
perfectly possible to manipulate current callee's context in custom
|
|
handler in manner that permits further stack unwinding. General idea is
|
|
that handler would not actually "handle" the exception, but instead
|
|
restore callee's context, as it was at its entry point and thus mimic
|
|
leaf function. In other words, handler would simply undertake part of
|
|
unwinding procedure. Consider following example:
|
|
|
|
\c function:
|
|
\c mov rax,rsp ; copy rsp to volatile register
|
|
\c push r15 ; save non-volatile registers
|
|
\c push rbx
|
|
\c push rbp
|
|
\c mov r11,rsp ; prepare variable stack frame
|
|
\c sub r11,rcx
|
|
\c and r11,-64
|
|
\c mov QWORD[r11],rax ; check for exceptions
|
|
\c mov rsp,r11 ; allocate stack frame
|
|
\c mov QWORD[rsp],rax ; save original rsp value
|
|
\c magic_point:
|
|
\c ...
|
|
\c mov r11,QWORD[rsp] ; pull original rsp value
|
|
\c mov rbp,QWORD[r11-24]
|
|
\c mov rbx,QWORD[r11-16]
|
|
\c mov r15,QWORD[r11-8]
|
|
\c mov rsp,r11 ; destroy frame
|
|
\c ret
|
|
|
|
The keyword is that up to \c{magic_point} original \c{rsp} value
|
|
remains in chosen volatile register and no non-volatile register,
|
|
except for \c{rsp}, is modified. While past \c{magic_point} \c{rsp}
|
|
remains constant till the very end of the \c{function}. In this case
|
|
custom language-specific exception handler would look like this:
|
|
|
|
\c EXCEPTION_DISPOSITION handler (EXCEPTION_RECORD *rec,ULONG64 frame,
|
|
\c CONTEXT *context,DISPATCHER_CONTEXT *disp)
|
|
\c { ULONG64 *rsp;
|
|
\c if (context->Rip<(ULONG64)magic_point)
|
|
\c rsp = (ULONG64 *)context->Rax;
|
|
\c else
|
|
\c { rsp = ((ULONG64 **)context->Rsp)[0];
|
|
\c context->Rbp = rsp[-3];
|
|
\c context->Rbx = rsp[-2];
|
|
\c context->R15 = rsp[-1];
|
|
\c }
|
|
\c context->Rsp = (ULONG64)rsp;
|
|
\c
|
|
\c memcpy (disp->ContextRecord,context,sizeof(CONTEXT));
|
|
\c RtlVirtualUnwind(UNW_FLAG_NHANDLER,disp->ImageBase,
|
|
\c dips->ControlPc,disp->FunctionEntry,disp->ContextRecord,
|
|
\c &disp->HandlerData,&disp->EstablisherFrame,NULL);
|
|
\c return ExceptionContinueSearch;
|
|
\c }
|
|
|
|
As custom handler mimics leaf function, corresponding \c{UNWIND_INFO}
|
|
structure does not have to contain any information about stack frame
|
|
and its layout.
|
|
|
|
\H{cofffmt} \i\c{coff}: \i{Common Object File Format}
|
|
|
|
The \c{coff} output type produces \c{COFF} object files suitable for
|
|
linking with the \i{DJGPP} linker.
|
|
|
|
\c{coff} provides a default output file-name extension of \c{.o}.
|
|
|
|
The \c{coff} format supports the same extensions to the \c{SECTION}
|
|
directive as \c{win32} does, except that the \c{align} qualifier and
|
|
the \c{info} section type are not supported.
|
|
|
|
\H{machofmt} \I{Mach-O}\i\c{macho32} and \i\c{macho64}: \i{Mach Object File Format}
|
|
|
|
The \c{macho32} and \c{macho64} output formts produces Mach-O
|
|
object files suitable for linking with the \i{MacOS X} linker.
|
|
\i\c{macho} is a synonym for \c{macho32}.
|
|
|
|
\c{macho} provides a default output file-name extension of \c{.o}.
|
|
|
|
\S{machosect} \c{macho} extensions to the \c{SECTION} Directive
|
|
\I{SECTION, macho extensions to}
|
|
|
|
The \c{macho} output format specifies section names in the format
|
|
"\e{segment}\c{,}\e{section}". No spaces are allowed around the
|
|
comma. The following flags can also be specified:
|
|
|
|
\b \c{data} - this section contains initialized data items
|
|
|
|
\b \c{code} - this section contains code exclusively
|
|
|
|
\b \c{mixed} - this section contains both code and data
|
|
|
|
\b \c{bss} - this section is uninitialized and filled with zero
|
|
|
|
\b \c{zerofill} - same as \c{bss}
|
|
|
|
\b \c{no_dead_strip} - inhibit dead code stripping for this section
|
|
|
|
\b \c{live_support} - set the live support flag for this section
|
|
|
|
\b \c{strip_static_syms} - strip static symbols for this section
|
|
|
|
\b \c{debug} - this section contains debugging information
|
|
|
|
\b \c{align=}\e{alignment} - specify section alignment
|
|
|
|
The default is \c{data}, unless the section name is \c{__text} or
|
|
\c{__bss} in which case the default is \c{text} or \c{bss},
|
|
respectively.
|
|
|
|
For compatibility with other Unix platforms, the following standard
|
|
names are also supported:
|
|
|
|
\c .text = __TEXT,__text text
|
|
\c .rodata = __DATA,__const data
|
|
\c .data = __DATA,__data data
|
|
\c .bss = __DATA,__bss bss
|
|
|
|
If the \c{.rodata} section contains no relocations, it is instead put
|
|
into the \c{__TEXT,__const} section unless this section has already
|
|
been specified explicitly. However, it is probably better to specify
|
|
\c{__TEXT,__const} and \c{__DATA,__const} explicitly as appropriate.
|
|
|
|
\S{machotls} \i{Thread Local Storage in Mach-O}\I{TLS}: \c{macho} special
|
|
symbols and \i\c{WRT}
|
|
|
|
Mach-O defines the following special symbols that can be used on the
|
|
right-hand side of the \c{WRT} operator:
|
|
|
|
\b \c{..tlvp} is used to specify access to thread-local storage.
|
|
|
|
\b \c{..gotpcrel} is used to specify references to the Global Offset
|
|
Table. The GOT is supported in the \c{macho64} format only.
|
|
|
|
\S{macho-ssvs} \c{macho} specific directive \i\c{subsections_via_symbols}
|
|
|
|
The directive \c{subsections_via_symbols} sets the
|
|
\c{MH_SUBSECTIONS_VIA_SYMBOLS} flag in the Mach-O header, that effectively
|
|
separates a block (or a subsection) based on a symbol. It is often used
|
|
for eliminating dead codes by a linker.
|
|
|
|
This directive takes no arguments.
|
|
|
|
This is a macro implemented as a \c{%pragma}. It can also be
|
|
specified in its \c{%pragma} form, in which case it will not affect
|
|
non-Mach-O builds of the same source code:
|
|
|
|
\c %pragma macho subsections_via_symbols
|
|
|
|
\S{macho-ssvs} \c{macho} specific directive \i\c{no_dead_strip}
|
|
|
|
The directive \c{no_dead_strip} sets the Mach-O \c{SH_NO_DEAD_STRIP}
|
|
section flag on the section containing a a specific symbol. This
|
|
directive takes a list of symbols as its arguments.
|
|
|
|
This is a macro implemented as a \c{%pragma}. It can also be
|
|
specified in its \c{%pragma} form, in which case it will not affect
|
|
non-Mach-O builds of the same source code:
|
|
|
|
\c %pragma macho no_dead_strip symbol...
|
|
|
|
\S{macho-pext} \c{macho} specific extensions to the \c{GLOBAL}
|
|
Directive: \i\c{private_extern}
|
|
|
|
The directive extension to \c{GLOBAL} marks the symbol with limited
|
|
global scope. For example, you can specify the global symbol with
|
|
this extension:
|
|
|
|
\c global foo:private_extern
|
|
\c foo:
|
|
\c ; codes
|
|
|
|
Using with static linker will clear the private extern attribute.
|
|
But linker option like \c{-keep_private_externs} can avoid it.
|
|
|
|
\H{elffmt} \i\c{elf32}, \i\c{elf64}, \i\c{elfx32}: \I{ELF}\I{linux, elf}\i{Executable and Linkable
|
|
Format} Object Files
|
|
|
|
The \c{elf32}, \c{elf64} and \c{elfx32} output formats generate
|
|
\c{ELF32 and ELF64} (Executable and Linkable Format) object files, as
|
|
used by Linux as well as \i{Unix System V}, including \i{Solaris x86},
|
|
\i{UnixWare} and \i{SCO Unix}. ELF provides a default output
|
|
file-name extension of \c{.o}. \c{elf} is a synonym for \c{elf32}.
|
|
|
|
The \c{elfx32} format is used for the \i{x32} ABI, which is a 32-bit
|
|
ABI with the CPU in 64-bit mode.
|
|
|
|
\S{abisect} ELF specific directive \i\c{osabi}
|
|
|
|
The ELF header specifies the application binary interface for the
|
|
target operating system (OSABI). This field can be set by using the
|
|
\c{osabi} directive with the numeric value (0-255) of the target
|
|
system. If this directive is not used, the default value will be "UNIX
|
|
System V ABI" (0) which will work on most systems which support ELF.
|
|
|
|
\S{elfsect} ELF extensions to the \c{SECTION} Directive
|
|
\I{SECTION, ELF extensions to}
|
|
|
|
Like the \c{obj} format, \c{elf} allows you to specify additional
|
|
information on the \c{SECTION} directive line, to control the type
|
|
and properties of sections you declare. Section types and properties
|
|
are generated automatically by NASM for the \i{standard section
|
|
names}, but may still be
|
|
overridden by these qualifiers.
|
|
|
|
The available qualifiers are:
|
|
|
|
\b \i\c{alloc} defines the section to be one which is loaded into
|
|
memory when the program is run. \i\c{noalloc} defines it to be one
|
|
which is not, such as an informational or comment section.
|
|
|
|
\b \i\c{exec} defines the section to be one which should have execute
|
|
permission when the program is run. \i\c{noexec} defines it as one
|
|
which should not.
|
|
|
|
\b \i\c{write} defines the section to be one which should be writable
|
|
when the program is run. \i\c{nowrite} defines it as one which should
|
|
not.
|
|
|
|
\b \i\c{progbits} defines the section to be one with explicit contents
|
|
stored in the object file: an ordinary code or data section, for
|
|
example.
|
|
|
|
\b \i\c{nobits} defines the section to be one with no explicit
|
|
contents given, such as a BSS section.
|
|
|
|
\b \i\c{note} indicates that this section contains ELF notes. The
|
|
content of ELF notes are specified using normal assembly instructions;
|
|
it is up to the programmer to ensure these are valid ELF notes.
|
|
|
|
\b \i\c{preinit_array} indicates that this section contains function
|
|
addresses to be called before any other initialization has happened.
|
|
|
|
\b \i\c{init_array} indicates that this section contains function
|
|
addresses to be called during initialization.
|
|
|
|
\b \i\c{fini_array} indicates that this section contains function
|
|
pointers to be called during termination.
|
|
|
|
\b \I{align, ELF attribute}\c{align=}, used with a trailing number as in \c{obj}, gives the
|
|
\I{section alignment, in elf}\I{alignment, in elf sections}alignment
|
|
requirements of the section.
|
|
|
|
\b \c{byte}, \c{word}, \c{dword}, \c{qword}, \c{tword}, \c{oword},
|
|
\c{yword}, or \c{zword} with an optional \c{*}\i{multiplier} specify
|
|
the fundamental data item size for a section which contains either
|
|
fixed-sized data structures or strings; it also sets a default
|
|
alignment. This is generally used with the \c{strings} and \c{merge}
|
|
attributes (see below.) For example \c{byte*4} defines a unit size of
|
|
4 bytes, with a default alignment of 1; \c{dword} also defines a unit
|
|
size of 4 bytes, but with a default alignment of 4. The \c{align=}
|
|
attribute, if specified, overrides this default alignment.
|
|
|
|
\b \I{pointer, ELF attribute}\c{pointer} is equivalent to \c{dword}
|
|
for \c{elf32} or \c{elfx32}, and \c{qword} for \c{elf64}.
|
|
|
|
\b \I{strings, ELF attribute}\c{strings} indicate that this section
|
|
contains exclusively null-terminated strings. By default these are
|
|
assumed to be byte strings, but a size specifier can be used to
|
|
override that.
|
|
|
|
\b \i\c{merge} indicates that duplicate data elements in this section
|
|
should be merged with data elements from other object files. Data
|
|
elements can be either fixed-sized objects or null-terminatedstrings
|
|
(with the \c{strings} attribute.) A size specifier is required unless
|
|
\c{strings} is specified, in which case the size defaults to \c{byte}.
|
|
|
|
\b \i\c{tls} defines the section to be one which contains
|
|
thread local variables.
|
|
|
|
The defaults assumed by NASM if you do not specify the above
|
|
qualifiers are:
|
|
|
|
\I\c{.text} \I\c{.rodata} \I\c{.lrodata} \I\c{.data} \I\c{.ldata}
|
|
\I\c{.bss} \I\c{.lbss} \I\c{.tdata} \I\c{.tbss} \I\c\{.comment}
|
|
|
|
\c section .text progbits alloc exec nowrite align=16
|
|
\c section .rodata progbits alloc noexec nowrite align=4
|
|
\c section .lrodata progbits alloc noexec nowrite align=4
|
|
\c section .data progbits alloc noexec write align=4
|
|
\c section .ldata progbits alloc noexec write align=4
|
|
\c section .bss nobits alloc noexec write align=4
|
|
\c section .lbss nobits alloc noexec write align=4
|
|
\c section .tdata progbits alloc noexec write align=4 tls
|
|
\c section .tbss nobits alloc noexec write align=4 tls
|
|
\c section .comment progbits noalloc noexec nowrite align=1
|
|
\c section .preinit_array preinit_array alloc noexec nowrite pointer
|
|
\c section .init_array init_array alloc noexec nowrite pointer
|
|
\c section .fini_array fini_array alloc noexec nowrite pointer
|
|
\c section .note note noalloc noexec nowrite align=4
|
|
\c section other progbits alloc noexec nowrite align=1
|
|
|
|
(Any section name other than those in the above table
|
|
is treated by default like \c{other} in the above table.
|
|
Please note that section names are case sensitive.)
|
|
|
|
|
|
\S{elfwrt} \i{Position-Independent Code}\I{PIC}: ELF Special
|
|
Symbols and \i\c{WRT}
|
|
|
|
Since \c{ELF} does not support segment-base references, the \c{WRT}
|
|
operator is not used for its normal purpose; therefore NASM's
|
|
\c{elf} output format makes use of \c{WRT} for a different purpose,
|
|
namely the PIC-specific \I{relocations, PIC-specific}relocation
|
|
types.
|
|
|
|
\c{elf} defines five special symbols which you can use as the
|
|
right-hand side of the \c{WRT} operator to obtain PIC relocation
|
|
types. They are \i\c{..gotpc}, \i\c{..gotoff}, \i\c{..got},
|
|
\i\c{..plt} and \i\c{..sym}. Their functions are summarized here:
|
|
|
|
\b Referring to the symbol marking the global offset table base
|
|
using \c{wrt ..gotpc} will end up giving the distance from the
|
|
beginning of the current section to the global offset table.
|
|
(\i\c{_GLOBAL_OFFSET_TABLE_} is the standard symbol name used to
|
|
refer to the \i{GOT}.) So you would then need to add \i\c{$$} to the
|
|
result to get the real address of the GOT.
|
|
|
|
\b Referring to a location in one of your own sections using \c{wrt
|
|
..gotoff} will give the distance from the beginning of the GOT to
|
|
the specified location, so that adding on the address of the GOT
|
|
would give the real address of the location you wanted.
|
|
|
|
\b Referring to an external or global symbol using \c{wrt ..got}
|
|
causes the linker to build an entry \e{in} the GOT containing the
|
|
address of the symbol, and the reference gives the distance from the
|
|
beginning of the GOT to the entry; so you can add on the address of
|
|
the GOT, load from the resulting address, and end up with the
|
|
address of the symbol.
|
|
|
|
\b Referring to a procedure name using \c{wrt ..plt} causes the
|
|
linker to build a \i{procedure linkage table} entry for the symbol,
|
|
and the reference gives the address of the \i{PLT} entry. You can
|
|
only use this in contexts which would generate a PC-relative
|
|
relocation normally (i.e. as the destination for \c{CALL} or
|
|
\c{JMP}), since ELF contains no relocation type to refer to PLT
|
|
entries absolutely.
|
|
|
|
\b Referring to a symbol name using \c{wrt ..sym} causes NASM to
|
|
write an ordinary relocation, but instead of making the relocation
|
|
relative to the start of the section and then adding on the offset
|
|
to the symbol, it will write a relocation record aimed directly at
|
|
the symbol in question. The distinction is a necessary one due to a
|
|
peculiarity of the dynamic linker.
|
|
|
|
A fuller explanation of how to use these relocation types to write
|
|
shared libraries entirely in NASM is given in \k{picdll}.
|
|
|
|
\S{elftls} \i{Thread Local Storage in ELF}\I{TLS}: \c{elf} Special
|
|
Symbols and \i\c{WRT}
|
|
|
|
\b In ELF32 mode, referring to an external or global symbol using
|
|
\c{wrt ..tlsie} \I\c{..tlsie}
|
|
causes the linker to build an entry \e{in} the GOT containing the
|
|
offset of the symbol within the TLS block, so you can access the value
|
|
of the symbol with code such as:
|
|
|
|
\c mov eax,[tid wrt ..tlsie]
|
|
\c mov [gs:eax],ebx
|
|
|
|
|
|
\b In ELF64 or ELFx32 mode, referring to an external or global symbol using
|
|
\c{wrt ..gottpoff} \I\c{..gottpoff}
|
|
causes the linker to build an entry \e{in} the GOT containing the
|
|
offset of the symbol within the TLS block, so you can access the value
|
|
of the symbol with code such as:
|
|
|
|
\c mov rax,[rel tid wrt ..gottpoff]
|
|
\c mov rcx,[fs:rax]
|
|
|
|
|
|
\S{elfglob} \c{elf} Extensions to the \c{GLOBAL} Directive\I{GLOBAL,
|
|
elf extensions to}\I{GLOBAL, aoutb extensions to}
|
|
|
|
\c{ELF} object files can contain more information about a global
|
|
symbol than just its address: they can contain the \I{symbols,
|
|
specifying sizes}\I{size, of symbols}size of the symbol and its
|
|
\I{symbols, specifying types}\I{type, of symbols}type as well. These
|
|
are not merely debugger conveniences, but are actually necessary when
|
|
the program being written is a \I{elf shared library}shared
|
|
library. NASM therefore supports some extensions to the \c{GLOBAL}
|
|
directive, allowing you to specify these features.
|
|
|
|
You can specify whether a global variable is a function or a data
|
|
object by suffixing the name with a colon and the word
|
|
\i\c{function} or \i\c{data}. (\i\c{object} is a synonym for
|
|
\c{data}.) For example:
|
|
|
|
\c global hashlookup:function, hashtable:data
|
|
|
|
exports the global symbol \c{hashlookup} as a function and
|
|
\c{hashtable} as a data object.
|
|
|
|
Optionally, you can control the ELF visibility of the symbol. Just
|
|
add one of the visibility keywords: \i\c{default}, \i\c{internal},
|
|
\i\c{hidden}, or \i\c{protected}. The default is \i\c{default} of
|
|
course. For example, to make \c{hashlookup} hidden:
|
|
|
|
\c global hashlookup:function hidden
|
|
|
|
Since version 2.15, it is possible to specify symbols binding. The keywords
|
|
are: \i\c{weak} to generate weak symbol or \i\c{strong}. The default is \i\c{strong}.
|
|
|
|
You can also specify the size of the data associated with the
|
|
symbol, as a numeric expression (which may involve labels, and even
|
|
forward references) after the type specifier. Like this:
|
|
|
|
\c global hashtable:data (hashtable.end - hashtable)
|
|
\c
|
|
\c hashtable:
|
|
\c db this,that,theother ; some data here
|
|
\c .end:
|
|
|
|
This makes NASM automatically calculate the length of the table and
|
|
place that information into the \c{ELF} symbol table.
|
|
|
|
Declaring the type and size of global symbols is necessary when
|
|
writing shared library code. For more information, see
|
|
\k{picglobal}.
|
|
|
|
|
|
\S{elfextrn} \c{elf} Extensions to the \c{EXTERN} Directive\I{EXTERN,
|
|
elf extensions to}\I{EXTERN, elf extensions to}
|
|
|
|
Since version 2.15 it is possible to specify keyword \i\c{weak} to generate weak external
|
|
reference. Example:
|
|
|
|
\c extern weak_ref:weak
|
|
|
|
|
|
\S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive
|
|
\I{COMMON, elf extensions to}
|
|
|
|
\c{ELF} also allows you to specify alignment requirements \I{common
|
|
variables, alignment in elf}\I{alignment, of elf common variables}on
|
|
common variables. This is done by putting a number (which must be a
|
|
power of two) after the name and size of the common variable,
|
|
separated (as usual) by a colon. For example, an array of
|
|
doublewords would benefit from 4-byte alignment:
|
|
|
|
\c common dwordarray 128:4
|
|
|
|
This declares the total size of the array to be 128 bytes, and
|
|
requires that it be aligned on a 4-byte boundary.
|
|
|
|
|
|
\S{elf16} 16-bit code and ELF
|
|
\I{ELF, 16-bit code}
|
|
|
|
Older versions of the \c{ELF32} specification did not provide
|
|
relocations for 8- and 16-bit values. It is now part of the formal
|
|
specification, and any new enough linker should support them.
|
|
|
|
ELF has currently no support for segmented programming.
|
|
|
|
\S{elfdbg} Debug formats and ELF
|
|
\I{ELF, debug formats}
|
|
|
|
ELF provides debug information in \c{STABS} and \c{DWARF} formats.
|
|
Line number information is generated for all executable sections, but please
|
|
note that only the ".text" section is executable by default.
|
|
|
|
\H{aoutfmt} \i\c{aout}: Linux \I{a.out, Linux version}\I{linux, a.out}\c{a.out} Object Files
|
|
|
|
The \c{aout} format generates \c{a.out} object files, in the form used
|
|
by early Linux systems (current Linux systems use ELF, see
|
|
\k{elffmt}.) These differ from other \c{a.out} object files in that
|
|
the magic number in the first four bytes of the file is
|
|
different; also, some implementations of \c{a.out}, for example
|
|
NetBSD's, support position-independent code, which Linux's
|
|
implementation does not.
|
|
|
|
\c{a.out} provides a default output file-name extension of \c{.o}.
|
|
|
|
\c{a.out} is a very simple object format. It supports no special
|
|
directives, no special symbols, no use of \c{SEG} or \c{WRT}, and no
|
|
extensions to any standard directives. It supports only the three
|
|
\i{standard section names} \i\c{.text}, \i\c{.data} and \i\c{.bss}.
|
|
|
|
|
|
\H{aoutfmt} \i\c{aoutb}: \i{NetBSD}/\i{FreeBSD}/\i{OpenBSD}
|
|
\I{a.out, BSD version}\c{a.out} Object Files
|
|
|
|
The \c{aoutb} format generates \c{a.out} object files, in the form
|
|
used by the various free \c{BSD Unix} clones, \c{NetBSD}, \c{FreeBSD}
|
|
and \c{OpenBSD}. For simple object files, this object format is exactly
|
|
the same as \c{aout} except for the magic number in the first four bytes
|
|
of the file. However, the \c{aoutb} format supports
|
|
\I{PIC}\i{position-independent code} in the same way as the \c{elf}
|
|
format, so you can use it to write \c{BSD} \i{shared libraries}.
|
|
|
|
\c{aoutb} provides a default output file-name extension of \c{.o}.
|
|
|
|
\c{aoutb} supports no special directives, no special symbols, and
|
|
only the three \i{standard section names} \i\c{.text}, \i\c{.data}
|
|
and \i\c{.bss}. However, it also supports the same use of \i\c{WRT} as
|
|
\c{elf} does, to provide position-independent code relocation types.
|
|
See \k{elfwrt} for full documentation of this feature.
|
|
|
|
\c{aoutb} also supports the same extensions to the \c{GLOBAL}
|
|
directive as \c{elf} does: see \k{elfglob} for documentation of
|
|
this.
|
|
|
|
|
|
\H{as86fmt} \c{as86}: \i{Minix}/Linux\I{linux, as86} \i\c{as86} Object Files
|
|
|
|
The Minix/Linux 16-bit assembler \c{as86} has its own non-standard
|
|
object file format. Although its companion linker \i\c{ld86} produces
|
|
something close to ordinary \c{a.out} binaries as output, the object
|
|
file format used to communicate between \c{as86} and \c{ld86} is not
|
|
itself \c{a.out}.
|
|
|
|
NASM supports this format, just in case it is useful, as \c{as86}.
|
|
\c{as86} provides a default output file-name extension of \c{.o}.
|
|
|
|
\c{as86} is a very simple object format (from the NASM user's point
|
|
of view). It supports no special directives, no use of \c{SEG} or \c{WRT},
|
|
and no extensions to any standard directives. It supports only the three
|
|
\i{standard section names} \i\c{.text}, \i\c{.data} and \i\c{.bss}. The
|
|
only special symbol supported is \c{..start}.
|
|
|
|
|
|
\H{dbgfmt} \i\c{dbg}: Debugging Format
|
|
|
|
The \c{dbg} format does not output an object file as such; instead,
|
|
it outputs a text file which contains a complete list of all the
|
|
transactions between the main body of NASM and the output-format
|
|
back end module. It is primarily intended to aid people who want to
|
|
write their own output drivers, so that they can get a clearer idea
|
|
of the various requests the main program makes of the output driver,
|
|
and in what order they happen.
|
|
|
|
For simple files, one can easily use the \c{dbg} format like this:
|
|
|
|
\c nasm -f dbg filename.asm
|
|
|
|
which will generate a diagnostic file called \c{filename.dbg}.
|
|
However, this will not work well on files which were designed for a
|
|
different object format, because each object format defines its own
|
|
macros (usually user-level forms of directives), and those macros
|
|
will not be defined in the \c{dbg} format. Therefore it can be
|
|
useful to run NASM twice, in order to do the preprocessing with the
|
|
native object format selected:
|
|
|
|
\c nasm -e -f elf32 -o elfprog.i elfprog.asm
|
|
\c nasm -a -f dbg elfprog.i
|
|
|
|
This preprocesses \c{elfprog.asm} into \c{elfprog.i}, keeping the
|
|
\c{elf32} object format selected in order to make sure ELF special
|
|
directives are converted into primitive form correctly. Then the
|
|
preprocessed source is fed through the \c{dbg} format to generate the
|
|
final diagnostic output.
|
|
|
|
This workaround will still typically not work for programs intended
|
|
for \c{obj} format, because the \c{obj} \c{SEGMENT} and \c{GROUP}
|
|
directives have side effects of defining the segment and group names
|
|
as symbols; \c{dbg} will not do this, so the program will not
|
|
assemble. You will have to work around that by defining the symbols
|
|
yourself (using \c{EXTERN}, for example) if you really need to get a
|
|
\c{dbg} trace of an \c{obj}-specific source file.
|
|
|
|
\c{dbg} accepts any section name and any directives at all, and logs
|
|
them all to its output file.
|
|
|
|
\c{dbg} accepts and logs any \c{%pragma}, but the specific
|
|
\c{%pragma}:
|
|
|
|
\c %pragma dbg maxdump <size>
|
|
|
|
where \c{<size>} is either a number or \c{unlimited}, can be used to
|
|
control the maximum size for dumping the full contents of a
|
|
\c{rawdata} output object.
|
|
|
|
|