and tell what opcode is generated by it. You can't do this in MASM:
if you declare, for example,
-\c foo equ 1
-\c bar dw 2
+\c foo equ 1
+\c bar dw 2
then the two lines of code
-\c mov ax,foo
-\c mov ax,bar
+\c mov ax,foo
+\c mov ax,bar
generate completely different opcodes, despite having
identical-looking syntaxes.
all forms of each supported instruction are given in
\k{iref}. For example, you can code:
-\c fadd st1 ; this sets st0 := st0 + st1
-\c fadd st0,st1 ; so does this
+\c fadd st1 ; this sets st0 := st0 + st1
+\c fadd st0,st1 ; so does this
\c
-\c fadd st1,st0 ; this sets st1 := st1 + st0
-\c fadd to st1 ; so does this
+\c fadd st1,st0 ; this sets st1 := st1 + st0
+\c fadd to st1 ; so does this
Almost any floating-point instruction that references memory must
use one of the prefixes \i\c{DWORD}, \i\c{QWORD} or \i\c{TWORD} to
be invoked in a wide range of ways:
\I{floating-point}\I{character constant}\I{string constant}
-\c db 0x55 ; just the byte 0x55
-\c db 0x55,0x56,0x57 ; three bytes in succession
-\c db 'a',0x55 ; character constants are OK
-\c db 'hello',13,10,'$' ; so are string constants
-\c dw 0x1234 ; 0x34 0x12
-\c dw 'a' ; 0x41 0x00 (it's just a number)
-\c dw 'ab' ; 0x41 0x42 (character constant)
-\c dw 'abc' ; 0x41 0x42 0x43 0x00 (string)
-\c dd 0x12345678 ; 0x78 0x56 0x34 0x12
-\c dd 1.234567e20 ; floating-point constant
-\c dq 1.234567e20 ; double-precision float
-\c dt 1.234567e20 ; extended-precision float
+\c db 0x55 ; just the byte 0x55
+\c db 0x55,0x56,0x57 ; three bytes in succession
+\c db 'a',0x55 ; character constants are OK
+\c db 'hello',13,10,'$' ; so are string constants
+\c dw 0x1234 ; 0x34 0x12
+\c dw 'a' ; 0x41 0x00 (it's just a number)
+\c dw 'ab' ; 0x41 0x42 (character constant)
+\c dw 'abc' ; 0x41 0x42 0x43 0x00 (string)
+\c dd 0x12345678 ; 0x78 0x56 0x34 0x12
+\c dd 1.234567e20 ; floating-point constant
+\c dq 1.234567e20 ; double-precision float
+\c dt 1.234567e20 ; extended-precision float
\c{DQ} and \c{DT} do not accept \i{numeric constants} or string
constants as operands.
For example:
-\c buffer: resb 64 ; reserve 64 bytes
-\c wordvar: resw 1 ; reserve a word
-\c realarray resq 10 ; array of ten reals
+\c buffer: resb 64 ; reserve 64 bytes
+\c wordvar: resw 1 ; reserve a word
+\c realarray resq 10 ; array of ten reals
\S{incbin} \i\c{INCBIN}: Including External \i{Binary Files}
directly into a game executable file. It can be called in one of
these three ways:
-\c incbin "file.dat" ; include the whole file
-\c incbin "file.dat",1024 ; skip the first 1024 bytes
-\c incbin "file.dat",1024,512 ; skip the first 1024, and
-\c ; actually include at most 512
+\c incbin "file.dat" ; include the whole file
+\c incbin "file.dat",1024 ; skip the first 1024 bytes
+\c incbin "file.dat",1024,512 ; skip the first 1024, and
+\c ; actually include at most 512
\S{equ} \i\c{EQU}: Defining Constants
This definition is absolute, and cannot change later. So, for
example,
-\c message db 'hello, world'
-\c msglen equ $-message
+\c message db 'hello, world'
+\c msglen equ $-message
defines \c{msglen} to be the constant 12. \c{msglen} may not then be
redefined later. This is not a \i{preprocessor} definition either:
syntax supported by \i{MASM}-compatible assemblers, in that you can
code
-\c zerobuf: times 64 db 0
+\c zerobuf: times 64 db 0
or similar things; but \c{TIMES} is more versatile than that. The
argument to \c{TIMES} is not just a numeric constant, but a numeric
\e{expression}, so you can do things like
-\c buffer: db 'hello, world'
-\c times 64-$+buffer db ' '
+\c buffer: db 'hello, world'
+\c times 64-$+buffer db ' '
which will store exactly enough spaces to make the total length of
\c{buffer} up to 64. Finally, \c{TIMES} can be applied to ordinary
instructions, so you can code trivial \i{unrolled loops} in it:
-\c times 100 movsb
+\c times 100 movsb
Note that there is no effective difference between \c{times 100 resb
1} and \c{resb 100}, except that the latter will be assembled about
to the desired address, enclosed in \i{square brackets}. For
example:
-\c wordvar dw 123
-\c mov ax,[wordvar]
-\c mov ax,[wordvar+1]
-\c mov ax,[es:wordvar+bx]
+\c wordvar dw 123
+\c mov ax,[wordvar]
+\c mov ax,[wordvar+1]
+\c mov ax,[es:wordvar+bx]
Anything not conforming to this simple system is not a valid memory
reference in NASM, for example \c{es:wordvar[bx]}.
More complicated effective addresses, such as those involving more
than one register, work in exactly the same way:
-\c mov eax,[ebx*2+ecx+offset]
-\c mov ax,[bp+di+8]
+\c mov eax,[ebx*2+ecx+offset]
+\c mov ax,[bp+di+8]
NASM is capable of doing \i{algebra} on these effective addresses,
so that things which don't necessarily \e{look} legal are perfectly
all right:
-\c mov eax,[ebx*5] ; assembles as [ebx*4+ebx]
-\c mov eax,[label1*2-label2] ; ie [label1+(label1-label2)]
+\c mov eax,[ebx*5] ; assembles as [ebx*4+ebx]
+\c mov eax,[label1*2-label2] ; ie [label1+(label1-label2)]
Some forms of effective address have more than one assembled form;
in most such cases NASM will generate the smallest form it can. For
Some examples:
-\c mov ax,100 ; decimal
-\c mov ax,0a2h ; hex
-\c mov ax,$0a2 ; hex again: the 0 is required
-\c mov ax,0xa2 ; hex yet again
-\c mov ax,777q ; octal
-\c mov ax,10010011b ; binary
+\c mov ax,100 ; decimal
+\c mov ax,0a2h ; hex
+\c mov ax,$0a2 ; hex again: the 0 is required
+\c mov ax,0xa2 ; hex yet again
+\c mov ax,777q ; octal
+\c mov ax,10010011b ; binary
\S{chrconst} \i{Character Constants}
is treated as a concatenation of maximum-size character constants
for the conditions. So the following are equivalent:
-\c db 'hello' ; string constant
-\c db 'h','e','l','l','o' ; equivalent character constants
+\c db 'hello' ; string constant
+\c db 'h','e','l','l','o' ; equivalent character constants
And the following are also equivalent:
-\c dd 'ninechars' ; doubleword string constant
-\c dd 'nine','char','s' ; becomes three doublewords
-\c db 'ninechars',0,0,0 ; and really looks like this
+\c dd 'ninechars' ; doubleword string constant
+\c dd 'nine','char','s' ; becomes three doublewords
+\c db 'ninechars',0,0,0 ; and really looks like this
Note that when used as an operand to \c{db}, a constant like
\c{'ab'} is treated as a string constant despite being short enough
Some examples:
-\c dd 1.2 ; an easy one
-\c dq 1.e10 ; 10,000,000,000
-\c dq 1.e+10 ; synonymous with 1.e10
-\c dq 1.e-10 ; 0.000 000 000 1
-\c dt 3.141592653589793238462 ; pi
+\c dd 1.2 ; an easy one
+\c dq 1.e10 ; 10,000,000,000
+\c dq 1.e+10 ; synonymous with 1.e10
+\c dq 1.e-10 ; 0.000 000 000 1
+\c dt 3.141592653589793238462 ; pi
NASM cannot do compile-time arithmetic on floating-point constants.
This is because NASM is designed to be portable - although it always
symbol, defined as the segment base relative to which the offset of
the symbol makes sense. So the code
-\c mov ax,seg symbol
-\c mov es,ax
-\c mov bx,symbol
+\c mov ax,seg symbol
+\c mov es,ax
+\c mov bx,symbol
will load \c{ES:BX} with a valid pointer to the symbol \c{symbol}.
preferred one. NASM lets you do this, by the use of the \c{WRT}
(With Reference To) keyword. So you can do things like
-\c mov ax,weird_seg ; weird_seg is a segment base
-\c mov es,ax
-\c mov bx,symbol wrt weird_seg
+\c mov ax,weird_seg ; weird_seg is a segment base
+\c mov es,ax
+\c mov bx,symbol wrt weird_seg
to load \c{ES:BX} with a different, but functionally equivalent,
pointer to the symbol \c{symbol}.
both represent immediate values. So to call a far procedure, you
could code either of
-\c call (seg procedure):procedure
-\c call weird_seg:(procedure wrt weird_seg)
+\c call (seg procedure):procedure
+\c call weird_seg:(procedure wrt weird_seg)
(The parentheses are included for clarity, to show the intended
parsing of the above instructions. They are not necessary in
To declare a \i{far pointer} to a data item in a data segment, you
must code
-\c dw symbol, seg symbol
+\c dw symbol, seg symbol
NASM supports no convenient synonym for this, though you can always
invent one using the macro processor.
thing NASM can't handle is code whose size depends on the value of a
symbol declared after the code in question. For example,
-\c times (label-$) db 0
-\c label: db 'Where am I?'
+\c times (label-$) db 0
+\c label: db 'Where am I?'
The argument to \i\c{TIMES} in this case could equally legally
evaluate to anything at all; NASM will reject this example because
It will just as firmly reject the slightly \I{paradox}paradoxical
code
-\c times (label-$+1) db 0
-\c label: db 'NOW where am I?'
+\c times (label-$+1) db 0
+\c label: db 'NOW where am I?'
in which \e{any} value for the \c{TIMES} argument is by definition
wrong!
Critical expressions can crop up in other contexts as well: consider
the following code.
-\c mov ax,symbol1
-\c symbol1 equ symbol2
+\c mov ax,symbol1
+\c symbol1 equ symbol2
\c symbol2:
On the first pass, NASM cannot determine the value of \c{symbol1},
There is a related issue involving \i{forward references}: consider
this code fragment.
-\c mov eax,[ebx+offset]
-\c offset equ 10
+\c mov eax,[ebx+offset]
+\c offset equ 10
NASM, on pass one, must calculate the size of the instruction \c{mov
eax,[ebx+offset]} without knowing the value of \c{offset}. It has no
label, which means that it is associated with the previous non-local
label. So, for example:
-\c label1 ; some code
-\c .loop ; some more code
-\c jne .loop
-\c ret
-\c label2 ; some code
-\c .loop ; some more code
-\c jne .loop
-\c ret
+\c label1 ; some code
+\c
+\c .loop
+\c ; some more code
+\c
+\c jne .loop
+\c ret
+\c
+\c label2 ; some code
+\c
+\c .loop
+\c ; some more code
+\c
+\c jne .loop
+\c ret
In the above code fragment, each \c{JNE} instruction jumps to the
line immediately before it, because the two definitions of \c{.loop}
defines a symbol called \c{label2.loop}. So, if you really needed
to, you could write
-\c label3 ; some more code
-\c ; and some more
-\c jmp label1.loop
+\c label3 ; some more code
+\c ; and some more
+\c
+\c jmp label1.loop
Sometimes it is useful - in a macro, for instance - to be able to
define a label which can be referenced from anywhere but which
the \I{label prefix}special prefix \i\c{..@}, then it does nothing
to the local label mechanism. So you could code
-\c label1: ; a non-local label
-\c .local: ; this is really label1.local
-\c ..@foo: ; this is a special symbol
-\c label2: ; another non-local label
-\c .local: ; this is really label2.local
-\c jmp ..@foo ; this will jump three lines up
+\c label1: ; a non-local label
+\c .local: ; this is really label1.local
+\c ..@foo: ; this is a special symbol
+\c label2: ; another non-local label
+\c .local: ; this is really label2.local
+\c
+\c jmp ..@foo ; this will jump three lines up
NASM has the capacity to define other special symbols beginning with
a double period: for example, \c{..start} is used to specify the
directive. The definitions work in a similar way to C; so you can do
things like
-\c %define ctrl 0x1F &
+\c %define ctrl 0x1F &
\c %define param(a,b) ((a)+(a)*(b))
-\c mov byte [param(2,ebx)], ctrl 'D'
+\c
+\c mov byte [param(2,ebx)], ctrl 'D'
which will expand to
-\c mov byte [(2)+(2)*(ebx)], 0x1F & 'D'
+\c mov byte [(2)+(2)*(ebx)], 0x1F & 'D'
When the expansion of a single-line macro contains tokens which
invoke another macro, the expansion is performed at invocation time,
not at definition time. Thus the code
-\c %define a(x) 1+b(x)
-\c %define b(x) 2*x
-\c mov ax,a(8)
+\c %define a(x) 1+b(x)
+\c %define b(x) 2*x
+\c
+\c mov ax,a(8)
will evaluate in the expected way to \c{mov ax,1+2*8}, even though
the macro \c{b} wasn't defined at the time of definition of \c{a}.
preprocessor will only expand the first occurrence of the macro.
Hence, if you code
-\c %define a(x) 1+a(x)
-\c mov ax,a(3)
+\c %define a(x) 1+a(x)
+\c
+\c mov ax,a(3)
the macro \c{a(3)} will expand once, becoming \c{1+a(3)}, and will
then expand no further. This behaviour can be useful: see \k{32c}
You can \I{overloading, single-line macros}overload single-line
macros: if you write
-\c %define foo(x) 1+x
+\c %define foo(x) 1+x
\c %define foo(x,y) 1+x*y
the preprocessor will be able to handle both types of macro call,
As an example, consider the following:
-\c %define BDASTART 400h ; Start of BIOS data area
+\c %define BDASTART 400h ; Start of BIOS data area
-\c struc tBIOSDA ; its structure
-\c .COM1addr RESW 1
-\c .COM2addr RESW 1
-\c ; ..and so on
+\c struc tBIOSDA ; its structure
+\c .COM1addr RESW 1
+\c .COM2addr RESW 1
+\c ; ..and so on
\c endstruc
Now, if we need to access the elements of tBIOSDA in different places,
we can end up with:
-\c mov ax,BDASTART + tBIOSDA.COM1addr
-\c mov bx,BDASTART + tBIOSDA.COM2addr
+\c mov ax,BDASTART + tBIOSDA.COM1addr
+\c mov bx,BDASTART + tBIOSDA.COM2addr
This will become pretty ugly (and tedious) if used in many places, and
can be reduced in size significantly by using the following macro:
\c ; Macro to access BIOS variables by their names (from tBDA):
-\c %define BDA(x) BDASTART + tBIOSDA. %+ x
+\c %define BDA(x) BDASTART + tBIOSDA. %+ x
Now the above code can be written as:
-\c mov ax,BDA(COM1addr)
-\c mov bx,BDA(COM2addr)
+\c mov ax,BDA(COM1addr)
+\c mov bx,BDA(COM2addr)
Using this feature, we can simplify references to a lot of macros (and,
in turn, reduce typing errors).
\c %define foo bar
\c %undef foo
-\c mov eax, foo
+\c
+\c mov eax, foo
will expand to the instruction \c{mov eax, foo}, since after
\c{%undef} the macro \c{foo} is no longer defined.
Individual letters in strings can be extracted using \c{%substr}.
An example of its use is probably more useful than the description:
-\c %substr mychar 'xyz' 1 ; equivalent to %define mychar 'x'
-\c %substr mychar 'xyz' 2 ; equivalent to %define mychar 'y'
-\c %substr mychar 'xyz' 3 ; equivalent to %define mychar 'z'
+\c %substr mychar 'xyz' 1 ; equivalent to %define mychar 'x'
+\c %substr mychar 'xyz' 2 ; equivalent to %define mychar 'y'
+\c %substr mychar 'xyz' 3 ; equivalent to %define mychar 'z'
In this example, mychar gets the value of 'y'. As with \c{%strlen}
(see \k{strlen}), the first parameter is the single-line macro to
and TASM: a multi-line macro definition in NASM looks something like
this.
-\c %macro prologue 1
-\c push ebp
-\c mov ebp,esp
-\c sub esp,%1
+\c %macro prologue 1
+\c
+\c push ebp
+\c mov ebp,esp
+\c sub esp,%1
+\c
\c %endmacro
This defines a C-like function prologue as a macro: so you would
which would expand to the three lines of code
-\c myfunc: push ebp
-\c mov ebp,esp
-\c sub esp,12
+\c myfunc: push ebp
+\c mov ebp,esp
+\c sub esp,12
The number \c{1} after the macro name in the \c{%macro} line defines
the number of parameters the macro \c{prologue} expects to receive.
in \I{braces, around macro parameters}braces. So you could code
things like
-\c %macro silly 2
-\c %2: db %1
+\c %macro silly 2
+\c
+\c %2: db %1
+\c
\c %endmacro
-\c silly 'a', letter_a ; letter_a: db 'a'
-\c silly 'ab', string_ab ; string_ab: db 'ab'
-\c silly {13,10}, crlf ; crlf: db 13,10
+\c
+\c silly 'a', letter_a ; letter_a: db 'a'
+\c silly 'ab', string_ab ; string_ab: db 'ab'
+\c silly {13,10}, crlf ; crlf: db 13,10
\S{mlmacover} \i{Overloading Multi-Line Macros}
parameters. This time, no exception is made for macros with no
parameters at all. So you could define
-\c %macro prologue 0
-\c push ebp
-\c mov ebp,esp
+\c %macro prologue 0
+\c
+\c push ebp
+\c mov ebp,esp
+\c
\c %endmacro
to define an alternative form of the function prologue which
Sometimes, however, you might want to `overload' a machine
instruction; for example, you might want to define
-\c %macro push 2
-\c push %1
-\c push %2
+\c %macro push 2
+\c
+\c push %1
+\c push %2
+\c
\c %endmacro
so that you could code
-\c push ebx ; this line is not a macro call
-\c push eax,ecx ; but this one is
+\c push ebx ; this line is not a macro call
+\c push eax,ecx ; but this one is
Ordinarily, NASM will give a warning for the first of the above two
lines, since \c{push} is now defined to be a macro, and is being
you can invent an instruction which executes a \c{RET} if the \c{Z}
flag is set by doing this:
-\c %macro retz 0
-\c jnz %%skip
-\c ret
-\c %%skip:
+\c %macro retz 0
+\c
+\c jnz %%skip
+\c ret
+\c %%skip:
+\c
\c %endmacro
You can call this macro as many times as you want, and every time
might be a macro to write a text string to a file in MS-DOS, where
you might want to be able to write
-\c writefile [filehandle],"hello, world",13,10
+\c writefile [filehandle],"hello, world",13,10
NASM allows you to define the last parameter of a macro to be
\e{greedy}, meaning that if you invoke the macro with more
the last defined one along with the separating commas. So if you
code:
-\c %macro writefile 2+
-\c jmp %%endstr
-\c %%str: db %2
-\c %%endstr: mov dx,%%str
-\c mov cx,%%endstr-%%str
-\c mov bx,%1
-\c mov ah,0x40
-\c int 0x21
+\c %macro writefile 2+
+\c
+\c jmp %%endstr
+\c %%str: db %2
+\c %%endstr:
+\c mov dx,%%str
+\c mov cx,%%endstr-%%str
+\c mov bx,%1
+\c mov ah,0x40
+\c int 0x21
+\c
\c %endmacro
then the example call to \c{writefile} above will work as expected:
of allowable parameter counts. If you do this, you can specify
defaults for \i{omitted parameters}. So, for example:
-\c %macro die 0-1 "Painful program death has occurred."
-\c writefile 2,%1
-\c mov ax,0x4c01
-\c int 0x21
+\c %macro die 0-1 "Painful program death has occurred."
+\c
+\c writefile 2,%1
+\c mov ax,0x4c01
+\c int 0x21
+\c
\c %endmacro
This macro (which makes use of the \c{writefile} macro defined in
\I{iterating over macro parameters}So a pair of macros to save and
restore a set of registers might work as follows:
-\c %macro multipush 1-*
-\c %rep %0
-\c push %1
-\c %rotate 1
-\c %endrep
+\c %macro multipush 1-*
+\c
+\c %rep %0
+\c push %1
+\c %rotate 1
+\c %endrep
+\c
\c %endmacro
This macro invokes the \c{PUSH} instruction on each of its arguments
This can be done by the following definition:
-\c %macro multipop 1-*
-\c %rep %0
-\c %rotate -1
-\c pop %1
-\c %endrep
+\c %macro multipop 1-*
+\c
+\c %rep %0
+\c %rotate -1
+\c pop %1
+\c %endrep
+\c
\c %endmacro
This macro begins by rotating its arguments one place to the
something like
\c %macro keytab_entry 2
-\c keypos%1 equ $-keytab
-\c db %2
+\c
+\c keypos%1 equ $-keytab
+\c db %2
+\c
\c %endmacro
+\c
\c keytab:
\c keytab_entry F1,128+1
\c keytab_entry F2,128+2
which would expand to
\c keytab:
-\c keyposF1 equ $-keytab
-\c db 128+1
-\c keyposF2 equ $-keytab
-\c db 128+2
-\c keyposReturn equ $-keytab
-\c db 13
+\c keyposF1 equ $-keytab
+\c db 128+1
+\c keyposF2 equ $-keytab
+\c db 128+2
+\c keyposReturn equ $-keytab
+\c db 13
You can just as easily concatenate text on to the other end of a
macro parameter, by writing \c{%1foo}.
condition code. So the \c{retz} macro defined in \k{maclocal} can be
replaced by a general \i{conditional-return macro} like this:
-\c %macro retc 1
-\c j%-1 %%skip
-\c ret
-\c %%skip:
+\c %macro retc 1
+\c
+\c j%-1 %%skip
+\c ret
+\c %%skip:
+\c
\c %endmacro
This macro can now be invoked using calls like \c{retc ne}, which
syntax of this feature looks like this:
\c %if<condition>
-\c ; some code which only appears if <condition> is met
+\c ; some code which only appears if <condition> is met
\c %elif<condition2>
-\c ; only appears if <condition> is not met but <condition2> is
+\c ; only appears if <condition> is not met but <condition2> is
\c %else
-\c ; this appears if neither <condition> nor <condition2> was met
+\c ; this appears if neither <condition> nor <condition2> was met
\c %endif
The \i\c{%else} clause is optional, as is the \i\c{%elif} clause.
\i\c{%elifndef}.
+\S{ifmacro} \i\c{ifmacro}: \i{Testing Multi-Line Macro Existence}
+
+The \c{%ifmacro} directive oeprates in the same way as the \c{%ifdef}
+directive, except that it checks for the existence of a multi-line macro.
+
+For example, you may be working with a large project and not have control
+over the macros in a library. You may want to create a macro with one
+name if it doesn't already exist, and another name if one with that name
+does exist.
+
+The %ifmacro is considered true if defining a macro with the given name
+and number of arguements would cause a definitions conflict. For example:
+
+\c %ifmacro MyMacro 1-3
+\c
+\c %error "MyMacro 1-3" causes a conflict with an existing macro.
+\c
+\c %else
+\c
+\c %macro MyMacro 1-3
+\c
+\c ; insert code to define the macro
+\c
+\c %endmacro
+\c
+\c %endif
+
+This will create the macro "MyMacro 1-3" if no macro already exists which
+would conflict with it, and emits a warning if there would be a definition
+conflict.
+
+You can test for the macro not existing by using the \i\c{ifnmacro} instead
+of \c{ifmacro}. Additional tests can be performed in %elif blocks by using
+\i\c{elifmacro} and \i\c{elifnmacro}.
+
+
\S{ifctx} \i\c{%ifctx}: \i{Testing the Context Stack}
The conditional-assembly construct \c{%ifctx ctxname} will cause the
For example, the following macro pushes a register or number on the
stack, and allows you to treat \c{IP} as a real register:
-\c %macro pushparam 1
-\c %ifidni %1,ip
-\c call %%label
-\c %%label:
-\c %else
-\c push %1
-\c %endif
+\c %macro pushparam 1
+\c
+\c %ifidni %1,ip
+\c call %%label
+\c %%label:
+\c %else
+\c push %1
+\c %endif
+\c
\c %endmacro
Like most other \c{%if} constructs, \c{%ifidn} has a counterpart
extended to take advantage of \c{%ifstr} in the following fashion:
\c %macro writefile 2-3+
-\c %ifstr %2
-\c jmp %%endstr
-\c %if %0 = 3
-\c %%str: db %2,%3
-\c %else
-\c %%str: db %2
-\c %endif
-\c %%endstr: mov dx,%%str
-\c mov cx,%%endstr-%%str
-\c %else
-\c mov dx,%2
-\c mov cx,%3
-\c %endif
-\c mov bx,%1
-\c mov ah,0x40
-\c int 0x21
+\c
+\c %ifstr %2
+\c jmp %%endstr
+\c %if %0 = 3
+\c %%str: db %2,%3
+\c %else
+\c %%str: db %2
+\c %endif
+\c %%endstr: mov dx,%%str
+\c mov cx,%%endstr-%%str
+\c %else
+\c mov dx,%2
+\c mov cx,%3
+\c %endif
+\c mov bx,%1
+\c mov ah,0x40
+\c int 0x21
+\c
\c %endmacro
Then the \c{writefile} macro can cope with being called in either of
the following two ways:
-\c writefile [file], strpointer, length
-\c writefile [file], "hello", 13, 10
+\c writefile [file], strpointer, length
+\c writefile [file], "hello", 13, 10
In the first, \c{strpointer} is used as the address of an
already-declared string, and \c{length} is used as its length; in
the right macros by means of code like this:
\c %ifdef SOME_MACRO
-\c ; do some setup
+\c ; do some setup
\c %elifdef SOME_OTHER_MACRO
-\c ; do some different setup
+\c ; do some different setup
\c %else
-\c %error Neither SOME_MACRO nor SOME_OTHER_MACRO was defined.
+\c %error Neither SOME_MACRO nor SOME_OTHER_MACRO was defined.
\c %endif
Then any user who fails to understand the way your code is supposed
replicated as many times as specified by the preprocessor:
\c %assign i 0
-\c %rep 64
-\c inc word [table+2*i]
+\c %rep 64
+\c inc word [table+2*i]
\c %assign i i+1
\c %endrep
\c %assign j 1
\c %rep 100
\c %if j > 65535
-\c %exitrep
+\c %exitrep
\c %endif
-\c dw j
+\c dw j
\c %assign k j+i
\c %assign i j
\c %assign j k
\c %endrep
+\c
\c fib_number equ ($-fibonacci)/2
This produces a list of all the Fibonacci numbers that will fit in
the form
\c %ifndef MACROS_MAC
-\c %define MACROS_MAC
-\c ; now define some macros
+\c %define MACROS_MAC
+\c ; now define some macros
\c %endif
then including the file more than once will not cause errors,
on the top of the context stack. \c{%push} requires one argument,
which is the name of the context. For example:
-\c %push foobar
+\c %push foobar
This pushes a new context called \c{foobar} on the stack. You can
have several contexts on the stack with the same name: they can
above could be implemented by means of:
\c %macro repeat 0
-\c %push repeat
-\c %$begin:
+\c
+\c %push repeat
+\c %$begin:
+\c
\c %endmacro
-
+\c
\c %macro until 1
-\c j%-1 %$begin
-\c %pop
+\c
+\c j%-1 %$begin
+\c %pop
+\c
\c %endmacro
and invoked by means of, for example,
-\c mov cx,string
-\c repeat
-\c add cx,3
-\c scasb
-\c until e
+\c mov cx,string
+\c repeat
+\c add cx,3
+\c scasb
+\c until e
which would scan every fourth byte of a string in search of the byte
in \c{AL}.
labels. So you could replace the destructive code
\c %pop
-\c %push newname
+\c %push newname
with the non-destructive version \c{%repl newname}.
implement a block IF statement as a set of macros.
\c %macro if 1
+\c
\c %push if
-\c j%-1 %$ifnot
+\c j%-1 %$ifnot
+\c
\c %endmacro
-
+\c
\c %macro else 0
-\c %ifctx if
-\c %repl else
-\c jmp %$ifend
+\c
+\c %ifctx if
+\c %repl else
+\c jmp %$ifend
\c %$ifnot:
-\c %else
-\c %error "expected `if' before `else'"
-\c %endif
+\c %else
+\c %error "expected `if' before `else'"
+\c %endif
+\c
\c %endmacro
-
+\c
\c %macro endif 0
-\c %ifctx if
+\c
+\c %ifctx if
\c %$ifnot:
\c %pop
-\c %elifctx else
+\c %elifctx else
\c %$ifend:
\c %pop
-\c %else
-\c %error "expected `if' or `else' before `endif'"
-\c %endif
+\c %else
+\c %error "expected `if' or `else' before `endif'"
+\c %endif
+\c
\c %endmacro
This code is more robust than the \c{REPEAT} and \c{UNTIL} macros
A sample usage of these macros might look like:
-\c cmp ax,bx
-\c if ae
-\c cmp bx,cx
-\c if ae
-\c mov ax,cx
-\c else
-\c mov ax,bx
-\c endif
-\c else
-\c cmp ax,cx
-\c if ae
-\c mov ax,cx
-\c endif
-\c endif
+\c cmp ax,bx
+\c
+\c if ae
+\c cmp bx,cx
+\c
+\c if ae
+\c mov ax,cx
+\c else
+\c mov ax,bx
+\c endif
+\c
+\c else
+\c cmp ax,cx
+\c
+\c if ae
+\c mov ax,cx
+\c endif
+\c
+\c endif
The block-\c{IF} macros handle nesting quite happily, by means of
pushing another context, describing the inner \c{if}, on top of the
line number in \c{EAX} and outputs something like `line 155: still
here'. You could then write a macro
-\c %macro notdeadyet 0
-\c push eax
-\c mov eax,__LINE__
-\c call stillhere
-\c pop eax
+\c %macro notdeadyet 0
+\c
+\c push eax
+\c mov eax,__LINE__
+\c call stillhere
+\c pop eax
+\c
\c %endmacro
and then pepper your code with calls to \c{notdeadyet} until you
For example, to define a structure called \c{mytype} containing a
longword, a word, a byte and a string of bytes, you might code
-\c struc mytype
-\c mt_long: resd 1
-\c mt_word: resw 1
-\c mt_byte: resb 1
-\c mt_str: resb 32
-\c endstruc
+\c struc mytype
+\c
+\c mt_long: resd 1
+\c mt_word: resw 1
+\c mt_byte: resb 1
+\c mt_str: resb 32
+\c
+\c endstruc
The above code defines six symbols: \c{mt_long} as 0 (the offset
from the beginning of a \c{mytype} structure to the longword field),
mechanism: if your structure members tend to have the same names in
more than one structure, you can define the above structure like this:
-\c struc mytype
-\c .long: resd 1
-\c .word: resw 1
-\c .byte: resb 1
-\c .str: resb 32
-\c endstruc
+\c struc mytype
+\c
+\c .long: resd 1
+\c .word: resw 1
+\c .byte: resb 1
+\c .str: resb 32
+\c
+\c endstruc
This defines the offsets to the structure fields as \c{mytype.long},
\c{mytype.word}, \c{mytype.byte} and \c{mytype.str}.
mechanism. To declare a structure of type \c{mytype} in a program,
you code something like this:
-\c mystruc: istruc mytype
-\c at mt_long, dd 123456
-\c at mt_word, dw 1024
-\c at mt_byte, db 'x'
-\c at mt_str, db 'hello, world', 13, 10, 0
-\c iend
+\c mystruc:
+\c istruc mytype
+\c
+\c at mt_long, dd 123456
+\c at mt_word, dw 1024
+\c at mt_byte, db 'x'
+\c at mt_str, db 'hello, world', 13, 10, 0
+\c
+\c iend
The function of the \c{AT} macro is to make use of the \c{TIMES}
prefix to advance the assembly position to the correct point for the
line to specify, the remaining source lines can easily come after
the \c{AT} line. For example:
-\c at mt_str, db 123,134,145,156,167,178,189
-\c db 190,100,0
+\c at mt_str, db 123,134,145,156,167,178,189
+\c db 190,100,0
Depending on personal taste, you can also omit the code part of the
\c{AT} line completely, and start the structure field on the next
line:
-\c at mt_str
-\c db 'hello, world'
-\c db 13,10,0
+\c at mt_str
+\c db 'hello, world'
+\c db 13,10,0
\S{align} \i\c{ALIGN} and \i\c{ALIGNB}: Data Alignment
(Some assemblers call this directive \i\c{EVEN}.) The syntax of the
\c{ALIGN} and \c{ALIGNB} macros is
-\c align 4 ; align on 4-byte boundary
-\c align 16 ; align on 16-byte boundary
-\c align 8,db 0 ; pad with 0s rather than NOPs
-\c align 4,resb 1 ; align to 4 in the BSS
-\c alignb 4 ; equivalent to previous line
+\c align 4 ; align on 4-byte boundary
+\c align 16 ; align on 16-byte boundary
+\c align 8,db 0 ; pad with 0s rather than NOPs
+\c align 4,resb 1 ; align to 4 in the BSS
+\c alignb 4 ; equivalent to previous line
Both macros require their first argument to be a power of two; they
both compute the number of additional bytes required to bring the
\c{ALIGNB} (or \c{ALIGN} with a second argument of \c{RESB 1}) can
be used within structure definitions:
-\c struc mytype2
-\c mt_byte: resb 1
-\c alignb 2
-\c mt_word: resw 1
-\c alignb 4
-\c mt_long: resd 1
-\c mt_str: resb 32
-\c endstruc
+\c struc mytype2
+\c
+\c mt_byte:
+\c resb 1
+\c alignb 2
+\c mt_word:
+\c resw 1
+\c alignb 4
+\c mt_long:
+\c resd 1
+\c mt_str:
+\c resb 32
+\c
+\c endstruc
This will ensure that the structure members are sensibly aligned
relative to the base of the structure.
which shows the use of \c{%arg} without any external macros:
\c some_function:
-\c %push mycontext ; save the current context
-\c %stacksize large ; tell NASM to use bp
-\c %arg i:word, j_ptr:word
-\c mov ax,[i]
-\c mov bx,[j_ptr]
-\c add ax,[bx]
-\c ret
-\c %pop ; restore original context
+\c
+\c %push mycontext ; save the current context
+\c %stacksize large ; tell NASM to use bp
+\c %arg i:word, j_ptr:word
+\c
+\c mov ax,[i]
+\c mov bx,[j_ptr]
+\c add ax,[bx]
+\c ret
+\c
+\c %pop ; restore original context
This is similar to the procedure defined in \k{16cmacro} and adds
the value in i to the value pointed to by j_ptr and returns the
An example of its use is the following:
\c silly_swap:
-\c %push mycontext ; save the current context
-\c %stacksize small ; tell NASM to use bp
-\c %assign %$localsize 0 ; see text for explanation
-\c %local old_ax:word, old_dx:word
-\c enter %$localsize,0 ; see text for explanation
-\c mov [old_ax],ax ; swap ax & bx
-\c mov [old_dx],dx ; and swap dx & cx
-\c mov ax,bx
-\c mov dx,cx
-\c mov bx,[old_ax]
-\c mov cx,[old_dx]
-\c leave ; restore old bp
-\c ret ;
-\c %pop ; restore original context
+\c
+\c %push mycontext ; save the current context
+\c %stacksize small ; tell NASM to use bp
+\c %assign %$localsize 0 ; see text for explanation
+\c %local old_ax:word, old_dx:word
+\c
+\c enter %$localsize,0 ; see text for explanation
+\c mov [old_ax],ax ; swap ax & bx
+\c mov [old_dx],dx ; and swap dx & cx
+\c mov ax,bx
+\c mov dx,cx
+\c mov bx,[old_ax]
+\c mov cx,[old_dx]
+\c leave ; restore old bp
+\c ret ;
+\c
+\c %pop ; restore original context
The \c{%$localsize} variable is used internally by the
\c{%local} directive and \e{must} be defined within the
\c{[SECTION]} directive which it is about to issue, and then issues
it. So the user-level directive
-\c SECTION .text
+\c SECTION .text
expands to the two lines
-\c %define __SECT__ [SECTION .text]
-\c [SECTION .text]
+\c %define __SECT__ [SECTION .text]
+\c [SECTION .text]
Users may find it useful to make use of this in their own macros.
For example, the \c{writefile} macro defined in \k{mlmacgre} can be
usefully rewritten in the following more sophisticated form:
-\c %macro writefile 2+
-\c [section .data]
-\c %%str: db %2
-\c %%endstr:
-\c __SECT__
-\c mov dx,%%str
-\c mov cx,%%endstr-%%str
-\c mov bx,%1
-\c mov ah,0x40
-\c int 0x21
+\c %macro writefile 2+
+\c
+\c [section .data]
+\c
+\c %%str: db %2
+\c %%endstr:
+\c
+\c __SECT__
+\c
+\c mov dx,%%str
+\c mov cx,%%endstr-%%str
+\c mov bx,%1
+\c mov ah,0x40
+\c int 0x21
+\c
\c %endmacro
This form of the macro, once passed a string to output, first
\c{ABSOLUTE} is used as follows:
-\c absolute 0x1A
-\c kbuf_chr resw 1
-\c kbuf_free resw 1
-\c kbuf resw 16
+\c absolute 0x1A
+\c
+\c kbuf_chr resw 1
+\c kbuf_free resw 1
+\c kbuf resw 16
This example describes a section of the PC BIOS data area, at
segment address 0x40: the above code defines \c{kbuf_chr} to be
expression}: see \k{crit}) and it can be a value in a segment. For
example, a TSR can re-use its setup code as run-time BSS like this:
-\c org 100h ; it's a .COM program
-\c jmp setup ; setup code comes last
-\c ; the resident part of the TSR goes here
-\c setup: ; now write the code that installs the TSR here
-\c absolute setup
-\c runtimevar1 resw 1
-\c runtimevar2 resd 20
+\c org 100h ; it's a .COM program
+\c
+\c jmp setup ; setup code comes last
+\c
+\c ; the resident part of the TSR goes here
+\c setup:
+\c ; now write the code that installs the TSR here
+\c
+\c absolute setup
+\c
+\c runtimevar1 resw 1
+\c runtimevar2 resd 20
+\c
\c tsr_end:
This defines some variables `on top of' the setup code, so that
The \c{EXTERN} directive takes as many arguments as you like. Each
argument is the name of a symbol:
-\c extern _printf
-\c extern _sscanf,_fscanf
+\c extern _printf
+\c extern _sscanf,_fscanf
Some object-file formats provide extra features to the \c{EXTERN}
directive. In all cases, the extra features are used by suffixing a
default segment base of an external should be the group \c{dgroup}
by means of the directive
-\c extern _variable:wrt dgroup
+\c extern _variable:wrt dgroup
The primitive form of \c{EXTERN} differs from the user-level form
only in that it can take only one argument at a time: the support
refer to symbols which \e{are} defined in the same module as the
\c{GLOBAL} directive. For example:
-\c global _main
-\c _main: ; some code
+\c global _main
+\c _main:
+\c ; some code
\c{GLOBAL}, like \c{EXTERN}, allows object formats to define private
extensions by means of a colon. The \c{elf} object format, for
example, lets you specify whether global data items are functions or
data:
-\c global hashlookup:function, hashtable:data
+\c global hashlookup:function, hashtable:data
Like \c{EXTERN}, the primitive form of \c{GLOBAL} differs from the
user-level form only in that it can take only one argument at a
A common variable is much like a global variable declared in the
uninitialised data section, so that
-\c common intvar 4
+\c common intvar 4
is similar in function to
-\c global intvar
-\c section .bss
-\c intvar resd 1
+\c global intvar
+\c section .bss
+\c
+\c intvar resd 1
The difference is that if more than one module defines the same
common variable, then at link time those variables will be
variables to be NEAR or FAR, and the \c{elf} format allows you to
specify the alignment requirements of a common variable:
-\c common commvar 4:near ; works in OBJ
-\c common intarray 100:4 ; works in ELF: 4 byte aligned
+\c common commvar 4:near ; works in OBJ
+\c common intarray 100:4 ; works in ELF: 4 byte aligned
Once again, like \c{EXTERN} and \c{GLOBAL}, the primitive form of
\c{COMMON} differs from the user-level form only in that it can take
\b\c{CPU PENTIUM} Same as 586
-\b\c{CPU 686} Pentium Pro instruction set
+\b\c{CPU 686} P6 instruction set
\b\c{CPU PPRO} Same as 686
-\b\c{CPU P2} Pentium II instruction set
+\b\c{CPU P2} Same as 686
\b\c{CPU P3} Pentium III and Katmai instruction sets
For example, the following code will generate the longword
\c{0x00000104}:
-\c org 0x100
-\c dd label
+\c org 0x100
+\c dd label
\c label:
Unlike the \c{ORG} directive provided by MASM-compatible assemblers,
segments. This is done by appending the \i\c{ALIGN} qualifier to the
end of the section-definition line. For example,
-\c section .data align=16
+\c section .data align=16
switches to the section \c{.data} and also specifies that it must be
aligned on a 16-byte boundary.
segment name as a symbol as well, so that you can access the segment
address of the segment. So, for example:
-\c segment data
-\c dvar: dw 1234
-\c segment code
-\c function: mov ax,data ; get segment address of data
-\c mov ds,ax ; and move it into DS
-\c inc word [dvar] ; now this reference will work
-\c ret
+\c segment data
+\c
+\c dvar: dw 1234
+\c
+\c segment code
+\c
+\c function:
+\c mov ax,data ; get segment address of data
+\c mov ds,ax ; and move it into DS
+\c inc word [dvar] ; now this reference will work
+\c ret
The \c{obj} format also enables the use of the \i\c{SEG} and
\i\c{WRT} operators, so that you can write code which does things
like
-\c extern foo
-\c mov ax,seg foo ; get preferred segment of foo
-\c mov ds,ax
-\c mov ax,data ; a different segment
-\c mov es,ax
-\c mov ax,[ds:foo] ; this accesses `foo'
-\c mov [es:foo wrt data],bx ; so does this
+\c extern foo
+\c
+\c mov ax,seg foo ; get preferred segment of foo
+\c mov ds,ax
+\c mov ax,data ; a different segment
+\c mov es,ax
+\c mov ax,[ds:foo] ; this accesses `foo'
+\c mov [es:foo wrt data],bx ; so does this
\S{objseg} \c{obj} Extensions to the \c{SEGMENT}
you are defining. This is done by appending extra qualifiers to the
end of the segment-definition line. For example,
-\c segment code private align=16
+\c segment code private align=16
defines the segment \c{code}, but also declares it to be a private
segment, and requires that the portion of it described in this code
a group. NASM therefore supplies the \c{GROUP} directive, whereby
you can code
-\c segment data
-\c ; some data
-\c segment bss
-\c ; some uninitialised data
-\c group dgroup data bss
+\c segment data
+\c
+\c ; some data
+\c
+\c segment bss
+\c
+\c ; some uninitialised data
+\c
+\c group dgroup data bss
which will define a group called \c{dgroup} to contain the segments
\c{data} and \c{bss}. Like \c{SEGMENT}, \c{GROUP} causes the group
wish to import and the name of the library you wish to import it
from. For example:
-\c import WSAStartup wsock32.dll
+\c import WSAStartup wsock32.dll
A third optional parameter gives the name by which the symbol is
known in the library you are importing it from, in case this is not
the same as the name you wish the symbol to be known by to your code
once you have imported it. For example:
-\c import asyncsel wsock32.dll WSAAsyncSelect
+\c import asyncsel wsock32.dll WSAAsyncSelect
\S{export} \i\c{EXPORT}: Exporting DLL Symbols\I{DLL symbols,
For example:
-\c export myfunc
-\c export myfunc TheRealMoreFormalLookingFunctionName
-\c export myfunc myfunc 1234 ; export by ordinal
-\c export myfunc myfunc resident parm=23 nodata
+\c export myfunc
+\c export myfunc TheRealMoreFormalLookingFunctionName
+\c export myfunc myfunc 1234 ; export by ordinal
+\c export myfunc myfunc resident parm=23 nodata
\S{dotdotstart} \i\c{..start}: Defining the \i{Program Entry
If you declare an external symbol with the directive
-\c extern foo
+\c extern foo
then references such as \c{mov ax,foo} will give you the offset of
\c{foo} from its preferred segment base (as specified in whichever
module \c{foo} is actually defined in). So to access the contents of
\c{foo} you will usually need to do something like
-\c mov ax,seg foo ; get preferred segment base
-\c mov es,ax ; move it into ES
-\c mov ax,[es:foo] ; and use offset `foo' from it
+\c mov ax,seg foo ; get preferred segment base
+\c mov es,ax ; move it into ES
+\c mov ax,[es:foo] ; and use offset `foo' from it
This is a little unwieldy, particularly if you know that an external
is going to be accessible from a given segment or group, say
\c{dgroup}. So if \c{DS} already contained \c{dgroup}, you could
simply code
-\c mov ax,[foo wrt dgroup]
+\c mov ax,[foo wrt dgroup]
However, having to type this every time you want to access \c{foo}
can be a pain; so NASM allows you to declare \c{foo} in the
alternative form
-\c extern foo:wrt dgroup
+\c extern foo:wrt dgroup
This form causes NASM to pretend that the preferred segment base of
\c{foo} is in fact \c{dgroup}; so the expression \c{seg foo} will
common variables} or far\I{far common variables}; NASM allows you to
specify which your variables should be by the use of the syntax
-\c common nearvar 2:near ; `nearvar' is a near common
-\c common farvar 10:far ; and `farvar' is far
+\c common nearvar 2:near ; `nearvar' is a near common
+\c common farvar 10:far ; and `farvar' is far
Far common variables may be greater in size than 64Kb, and so the
OMF specification says that they are declared as a number of
the element size on your far common variables. This is done by the
following syntax:
-\c common c_5by2 10:far 5 ; two five-byte elements
-\c common c_2by5 10:far 2 ; five two-byte elements
+\c common c_5by2 10:far 5 ; two five-byte elements
+\c common c_2by5 10:far 2 ; five two-byte elements
If no element size is specified, the default is 1. Also, the \c{FAR}
keyword is not required when an element size is specified, since
only far commons may have element sizes at all. So the above
declarations could equivalently be
-\c common c_5by2 10:5 ; two five-byte elements
-\c common c_2by5 10:2 ; five two-byte elements
+\c common c_5by2 10:5 ; two five-byte elements
+\c common c_2by5 10:2 ; five two-byte elements
In addition to these extensions, the \c{COMMON} directive in \c{obj}
also supports default-\c{WRT} specification like \c{EXTERN} does
(explained in \k{objextern}). So you can also declare things like
-\c common foo 10:wrt dgroup
-\c common bar 16:far 2:wrt data
-\c common baz 24:wrt data:6
+\c common foo 10:wrt dgroup
+\c common bar 16:far 2:wrt data
+\c common baz 24:wrt data:6
\H{win32fmt} \i\c{win32}: Microsoft Win32 Object Files
The defaults assumed by NASM if you do not specify the above
qualifiers are:
-\c section .text code align=16
-\c section .data data align=4
-\c section .rdata rdata align=8
-\c section .bss bss align=4
+\c section .text code align=16
+\c section .data data align=4
+\c section .rdata rdata align=8
+\c section .bss bss align=4
Any other section name is treated by default like \c{.text}.
The defaults assumed by NASM if you do not specify the above
qualifiers are:
-\c section .text progbits alloc exec nowrite align=16
-\c section .data progbits alloc noexec write align=4
-\c section .bss nobits alloc noexec write align=4
-\c section other progbits alloc noexec nowrite align=1
+\c section .text progbits alloc exec nowrite align=16
+\c section .data progbits alloc noexec write align=4
+\c section .bss nobits alloc noexec write align=4
+\c section other progbits alloc noexec nowrite align=1
(Any section name other than \c{.text}, \c{.data} and \c{.bss} is
treated by default like \c{other} in the above code.)
\i\c{function} or \i\c{data}. (\i\c{object} is a synonym for
\c{data}.) For example:
-\c global hashlookup:function, hashtable:data
+\c global hashlookup:function, hashtable:data
exports the global symbol \c{hashlookup} as a function and
\c{hashtable} as a data object.
symbol, as a numeric expression (which may involve labels, and even
forward references) after the type specifier. Like this:
-\c global hashtable:data (hashtable.end - hashtable)
+\c global hashtable:data (hashtable.end - hashtable)
+\c
\c hashtable:
-\c db this,that,theother ; some data here
+\c db this,that,theother ; some data here
\c .end:
This makes NASM automatically calculate the length of the table and
\k{picglobal}.
-\S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive\I{COMMON,
-elf extensions to}
+\S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive
+\I{COMMON, elf extensions to}
\c{ELF} also allows you to specify alignment requirements \I{common
variables, alignment in elf}\I{alignment, of elf common variables}on
separated (as usual) by a colon. For example, an array of
doublewords would benefit from 4-byte alignment:
-\c common dwordarray 128:4
+\c common dwordarray 128:4
This declares the total size of the array to be 128 bytes, and
requires that it be aligned on a 4-byte boundary.
This is done by the \c{LIBRARY} directive, which takes one argument
which is the name of the module:
-\c library mylib.rdl
+\c library mylib.rdl
\S{rdfmod} Specifying a Module Name: The \i\c{MODULE} Directive
linking. \c{MODULE} directive takes one argument which is the name
of current module:
-\c module mymodname
+\c module mymodname
Note that when you statically link modules and tell linker to strip
the symbols from output file, all module names will be stripped too.
To avoid it, you should start module names with \I{$prefix}\c{$}, like:
-\c module $kernel.core
+\c module $kernel.core
\S{rdfglob} \c{rdf} Extensions to the \c{GLOBAL} directive\I{GLOBAL,
Suffixing the name with a colon and the word \i\c{export} you make the
symbol exported:
-\c global sys_open:export
+\c global sys_open:export
To specify that exported symbol is a procedure (function), you add the
word \i\c{proc} or \i\c{function} after declaration:
-\c global sys_open:export proc
+\c global sys_open:export proc
Similarly, to specify exported data object, add the word \i\c{data}
or \i\c{object} to the directive:
-\c global kernel_ticks:export data
+\c global kernel_ticks:export data
\H{dbgfmt} \i\c{dbg}: Debugging Format
also provided in the \I{test subdirectory}\c{test} subdirectory of
the NASM archives, under the name \c{objexe.asm}.
-\c segment code
+\c segment code
\c
-\c ..start: mov ax,data
-\c mov ds,ax
-\c mov ax,stack
-\c mov ss,ax
-\c mov sp,stacktop
+\c ..start:
+\c mov ax,data
+\c mov ds,ax
+\c mov ax,stack
+\c mov ss,ax
+\c mov sp,stacktop
This initial piece of code sets up \c{DS} to point to the data
segment, and initialises \c{SS} and \c{SP} to point to the top of
beginning of this code, which means that will be the entry point
into the resulting executable file.
-\c mov dx,hello
-\c mov ah,9
-\c int 0x21
+\c mov dx,hello
+\c mov ah,9
+\c int 0x21
The above is the main program: load \c{DS:DX} with a pointer to the
greeting message (\c{hello} is implicitly relative to the segment
\c{data}, which was loaded into \c{DS} in the setup code, so the
full pointer is valid), and call the DOS print-string function.
-\c mov ax,0x4c00
-\c int 0x21
+\c mov ax,0x4c00
+\c int 0x21
This terminates the program using another DOS system call.
-\c segment data
-\c hello: db 'hello, world', 13, 10, '$'
+\c segment data
+\c
+\c hello: db 'hello, world', 13, 10, '$'
The data segment contains the string we want to display.
-\c segment stack stack
-\c resb 64
+\c segment stack stack
+\c resb 64
\c stacktop:
The above code declares a stack segment containing 64 bytes of
write a \c{.COM} program, you would create a source file looking
like
-\c org 100h
-\c section .text
-\c start: ; put your code here
-\c section .data
-\c ; put data items here
-\c section .bss
-\c ; put uninitialised data here
+\c org 100h
+\c
+\c section .text
+\c
+\c start:
+\c ; put your code here
+\c
+\c section .data
+\c
+\c ; put data items here
+\c
+\c section .bss
+\c
+\c ; put uninitialised data here
The \c{bin} format puts the \c{.text} section first in the file, so
you can declare data or BSS items before beginning to write code if
If you find the underscores inconvenient, you can define macros to
replace the \c{GLOBAL} and \c{EXTERN} directives as follows:
-\c %macro cglobal 1
-\c global _%1
-\c %define %1 _%1
+\c %macro cglobal 1
+\c
+\c global _%1
+\c %define %1 _%1
+\c
\c %endmacro
-
-\c %macro cextern 1
-\c extern _%1
-\c %define %1 _%1
+\c
+\c %macro cextern 1
+\c
+\c extern _%1
+\c %define %1 _%1
+\c
\c %endmacro
(These forms of the macros only take one argument at a time; a
If you then declare an external like this:
-\c cextern printf
+\c cextern printf
then the macro will expand it as
-\c extern _printf
+\c extern _printf
\c %define printf _printf
Thereafter, you can reference \c{printf} as if it was a symbol, and
Thus, you would define a function in C style in the following way.
The following example is for small model:
-\c global _myfunc
-\c _myfunc: push bp
-\c mov bp,sp
-\c sub sp,0x40 ; 64 bytes of local stack space
-\c mov bx,[bp+4] ; first parameter to function
-\c ; some more code
-\c mov sp,bp ; undo "sub sp,0x40" above
-\c pop bp
-\c ret
+\c global _myfunc
+\c
+\c _myfunc:
+\c push bp
+\c mov bp,sp
+\c sub sp,0x40 ; 64 bytes of local stack space
+\c mov bx,[bp+4] ; first parameter to function
+\c
+\c ; some more code
+\c
+\c mov sp,bp ; undo "sub sp,0x40" above
+\c pop bp
+\c ret
For a large-model function, you would replace \c{RET} by \c{RETF},
and look for the first parameter at \c{[BP+6]} instead of
At the other end of the process, to call a C function from your
assembly code, you would do something like this:
-\c extern _printf
-\c ; and then, further down...
-\c push word [myint] ; one of my integer variables
-\c push word mystring ; pointer into my data segment
-\c call _printf
-\c add sp,byte 4 ; `byte' saves space
-\c ; then those data items...
-\c segment _DATA
-\c myint dw 1234
-\c mystring db 'This number -> %d <- should be 1234',10,0
+\c extern _printf
+\c
+\c ; and then, further down...
+\c
+\c push word [myint] ; one of my integer variables
+\c push word mystring ; pointer into my data segment
+\c call _printf
+\c add sp,byte 4 ; `byte' saves space
+\c
+\c ; then those data items...
+\c
+\c segment _DATA
+\c
+\c myint dw 1234
+\c mystring db 'This number -> %d <- should be 1234',10,0
This piece of code is the small-model assembly equivalent of the C
code
base of the segment \c{_DATA}. If not, you would have to initialise
it first.
-\c push word [myint]
-\c push word seg mystring ; Now push the segment, and...
-\c push word mystring ; ... offset of "mystring"
-\c call far _printf
-\c add sp,byte 6
+\c push word [myint]
+\c push word seg mystring ; Now push the segment, and...
+\c push word mystring ; ... offset of "mystring"
+\c call far _printf
+\c add sp,byte 6
The integer value still takes up one word on the stack, since large
model does not affect the size of the \c{int} data type. The first
in \k{16cunder}.) Thus, a C variable declared as \c{int i} can be
accessed from assembler as
-\c extern _i
-\c mov ax,[_i]
+\c extern _i
+\c
+\c mov ax,[_i]
And to declare your own integer variable which C programs can access
as \c{extern int j}, you do this (making sure you are assembling in
the \c{_DATA} segment, if necessary):
-\c global _j
-\c _j dw 0
+\c global _j
+\c
+\c _j dw 0
To access a C array, you need to know the size of the components of
the array. For example, \c{int} variables are two bytes long, so if
An example of an assembly function using the macro set is given
here:
-\c proc _nearproc
-\c %$i arg
-\c %$j arg
-\c mov ax,[bp + %$i]
-\c mov bx,[bp + %$j]
-\c add ax,[bx]
-\c endproc
+\c proc _nearproc
+\c
+\c %$i arg
+\c %$j arg
+\c mov ax,[bp + %$i]
+\c mov bx,[bp + %$j]
+\c add ax,[bx]
+\c
+\c endproc
This defines \c{_nearproc} to be a procedure taking two arguments,
the first (\c{i}) an integer and the second (\c{j}) a pointer to an
The large-model equivalent of the above function would look like this:
\c %define FARCODE
-\c proc _farproc
-\c %$i arg
-\c %$j arg 4
-\c mov ax,[bp + %$i]
-\c mov bx,[bp + %$j]
-\c mov es,[bp + %$j + 2]
-\c add ax,[bx]
-\c endproc
+\c
+\c proc _farproc
+\c
+\c %$i arg
+\c %$j arg 4
+\c mov ax,[bp + %$i]
+\c mov bx,[bp + %$j]
+\c mov es,[bp + %$j + 2]
+\c add ax,[bx]
+\c
+\c endproc
This makes use of the argument to the \c{arg} macro to define a
parameter of size 4, because \c{j} is now a far pointer. When we
Thus, you would define a function in Pascal style, taking two
\c{Integer}-type parameters, in the following way:
-\c global myfunc
-\c myfunc: push bp
-\c mov bp,sp
-\c sub sp,0x40 ; 64 bytes of local stack space
-\c mov bx,[bp+8] ; first parameter to function
-\c mov bx,[bp+6] ; second parameter to function
-\c ; some more code
-\c mov sp,bp ; undo "sub sp,0x40" above
-\c pop bp
-\c retf 4 ; total size of params is 4
+\c global myfunc
+\c
+\c myfunc: push bp
+\c mov bp,sp
+\c sub sp,0x40 ; 64 bytes of local stack space
+\c mov bx,[bp+8] ; first parameter to function
+\c mov bx,[bp+6] ; second parameter to function
+\c
+\c ; some more code
+\c
+\c mov sp,bp ; undo "sub sp,0x40" above
+\c pop bp
+\c retf 4 ; total size of params is 4
At the other end of the process, to call a Pascal function from your
assembly code, you would do something like this:
-\c extern SomeFunc
-\c ; and then, further down...
-\c push word seg mystring ; Now push the segment, and...
-\c push word mystring ; ... offset of "mystring"
-\c push word [myint] ; one of my variables
-\c call far SomeFunc
+\c extern SomeFunc
+\c
+\c ; and then, further down...
+\c
+\c push word seg mystring ; Now push the segment, and...
+\c push word mystring ; ... offset of "mystring"
+\c push word [myint] ; one of my variables
+\c call far SomeFunc
This is equivalent to the Pascal code
reverse order. For example:
\c %define PASCAL
-\c proc _pascalproc
-\c %$j arg 4
-\c %$i arg
-\c mov ax,[bp + %$i]
-\c mov bx,[bp + %$j]
-\c mov es,[bp + %$j + 2]
-\c add ax,[bx]
-\c endproc
+\c
+\c proc _pascalproc
+\c
+\c %$j arg 4
+\c %$i arg
+\c mov ax,[bp + %$i]
+\c mov bx,[bp + %$j]
+\c mov es,[bp + %$j + 2]
+\c add ax,[bx]
+\c
+\c endproc
This defines the same routine, conceptually, as the example in
\k{16cmacro}: it defines a function taking two arguments, an integer
Thus, you would define a function in C style in the following way:
-\c global _myfunc
-\c _myfunc: push ebp
-\c mov ebp,esp
-\c sub esp,0x40 ; 64 bytes of local stack space
-\c mov ebx,[ebp+8] ; first parameter to function
-\c ; some more code
-\c leave ; mov esp,ebp / pop ebp
-\c ret
+\c global _myfunc
+\c
+\c _myfunc:
+\c push ebp
+\c mov ebp,esp
+\c sub esp,0x40 ; 64 bytes of local stack space
+\c mov ebx,[ebp+8] ; first parameter to function
+\c
+\c ; some more code
+\c
+\c leave ; mov esp,ebp / pop ebp
+\c ret
At the other end of the process, to call a C function from your
assembly code, you would do something like this:
-\c extern _printf
-\c ; and then, further down...
-\c push dword [myint] ; one of my integer variables
-\c push dword mystring ; pointer into my data segment
-\c call _printf
-\c add esp,byte 8 ; `byte' saves space
-\c ; then those data items...
-\c segment _DATA
-\c myint dd 1234
-\c mystring db 'This number -> %d <- should be 1234',10,0
+\c extern _printf
+\c
+\c ; and then, further down...
+\c
+\c push dword [myint] ; one of my integer variables
+\c push dword mystring ; pointer into my data segment
+\c call _printf
+\c add esp,byte 8 ; `byte' saves space
+\c
+\c ; then those data items...
+\c
+\c segment _DATA
+\c
+\c myint dd 1234
+\c mystring db 'This number -> %d <- should be 1234',10,0
This piece of code is the assembly equivalent of the C code
An example of an assembly function using the macro set is given
here:
-\c proc _proc32
-\c %$i arg
-\c %$j arg
-\c mov eax,[ebp + %$i]
-\c mov ebx,[ebp + %$j]
-\c add eax,[ebx]
-\c endproc
+\c proc _proc32
+\c
+\c %$i arg
+\c %$j arg
+\c mov eax,[ebp + %$i]
+\c mov ebx,[ebp + %$j]
+\c add eax,[ebx]
+\c
+\c endproc
This defines \c{_proc32} to be a procedure taking two arguments, the
first (\c{i}) an integer and the second (\c{j}) a pointer to an
Therefore, you cannot get at your variables by writing code like
this:
-\c mov eax,[myvar] ; WRONG
+\c mov eax,[myvar] ; WRONG
Instead, the linker provides an area of memory called the
\i\e{global offset table}, or \i{GOT}; the GOT is situated at a
Each code module in your shared library should define the GOT as an
external symbol:
-\c extern _GLOBAL_OFFSET_TABLE_ ; in ELF
-\c extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out
+\c extern _GLOBAL_OFFSET_TABLE_ ; in ELF
+\c extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out
At the beginning of any function in your shared library which plans
to access your data or BSS sections, you must first calculate the
address of the GOT. This is typically done by writing the function
in this form:
-\c func: push ebp
-\c mov ebp,esp
-\c push ebx
-\c call .get_GOT
-\c .get_GOT: pop ebx
-\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc
-\c ; the function body comes here
-\c mov ebx,[ebp-4]
-\c mov esp,ebp
-\c pop ebp
-\c ret
+\c func: push ebp
+\c mov ebp,esp
+\c push ebx
+\c call .get_GOT
+\c .get_GOT:
+\c pop ebx
+\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc
+\c
+\c ; the function body comes here
+\c
+\c mov ebx,[ebp-4]
+\c mov esp,ebp
+\c pop ebp
+\c ret
(For BSD, again, the symbol \c{_GLOBAL_OFFSET_TABLE} requires a
second leading underscore.)
obtain the address of the GOT by any other means, so you can put
those three instructions into a macro and safely ignore them:
-\c %macro get_GOT 0
-\c call %%getgot
-\c %%getgot: pop ebx
-\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
+\c %macro get_GOT 0
+\c
+\c call %%getgot
+\c %%getgot:
+\c pop ebx
+\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
+\c
\c %endmacro
\S{piclocal} Finding Your Local Data Items
relocation}\c{..gotoff} special \I\c{WRT ..gotoff}\c{WRT} type. The
way this works is like this:
-\c lea eax,[ebx+myvar wrt ..gotoff]
+\c lea eax,[ebx+myvar wrt ..gotoff]
The expression \c{myvar wrt ..gotoff} is calculated, when the shared
library is linked, to be the offset to the local variable \c{myvar}
to obtain the address of an external variable \c{extvar} in \c{EAX},
you would code
-\c mov eax,[ebx+extvar wrt ..got]
+\c mov eax,[ebx+extvar wrt ..got]
This loads the address of \c{extvar} out of an entry in the GOT. The
linker, when it builds the shared library, collects together every
So to export a function to users of the library, you must use
-\c global func:function ; declare it as a function
-\c func: push ebp
-\c ; etc.
+\c global func:function ; declare it as a function
+\c
+\c func: push ebp
+\c
+\c ; etc.
And to export a data item such as an array, you would have to code
-\c global array:data array.end-array ; give the size too
-\c array: resd 128
+\c global array:data array.end-array ; give the size too
+\c
+\c array: resd 128
\c .end:
Be careful: If you export a variable to the library user, by
one of your data sections, you can't do it by means of the standard
sort of code:
-\c dataptr: dd global_data_item ; WRONG
+\c dataptr: dd global_data_item ; WRONG
NASM will interpret this code as an ordinary relocation, in which
\c{global_data_item} is merely an offset from the beginning of the
Instead of the above code, then, you must write
-\c dataptr: dd global_data_item wrt ..sym
+\c dataptr: dd global_data_item wrt ..sym
which makes use of the special \c{WRT} type \I\c{WRT ..sym}\c{..sym}
to instruct NASM to search the symbol table for a particular symbol
Either method will work for functions: referring to one of your
functions by means of
-\c funcptr: dd my_function
+\c funcptr: dd my_function
will give the user the address of the code you wrote, whereas
-\c funcptr: dd my_function wrt ..sym
+\c funcptr: dd my_function wrt ..sym
will give the address of the procedure linkage table for the
function, which is where the calling program will \e{believe} the
segment is a 32-bit one. However, it must be assembled in a 16-bit
segment, so just coding, for example,
-\c jmp 0x1234:0x56789ABC ; wrong!
+\c jmp 0x1234:0x56789ABC ; wrong!
will not work, since the offset part of the address will be
truncated to \c{0x9ABC} and the jump will be an ordinary 16-bit far
\c{DB} instructions. NASM can go one better than that, by actually
generating the right instruction itself. Here's how to do it right:
-\c jmp dword 0x1234:0x56789ABC ; right
+\c jmp dword 0x1234:0x56789ABC ; right
\I\c{JMP DWORD}The \c{DWORD} prefix (strictly speaking, it should
come \e{after} the colon, since it is declaring the \e{offset} field
You can do the reverse operation, jumping from a 32-bit segment to a
16-bit one, by means of the \c{WORD} prefix:
-\c jmp word 0x8765:0x4321 ; 32 to 16 bit
+\c jmp word 0x8765:0x4321 ; 32 to 16 bit
If the \c{WORD} prefix is specified in 16-bit mode, or the \c{DWORD}
prefix in 32-bit mode, they will be ignored, since each is
the address, since any effective address containing a 32-bit
register is forced to be a 32-bit address. So you can do
-\c mov eax,offset_into_32_bit_segment_specified_by_fs
-\c mov dword [fs:eax],0x11223344
+\c mov eax,offset_into_32_bit_segment_specified_by_fs
+\c mov dword [fs:eax],0x11223344
This is fine, but slightly cumbersome (since it wastes an
instruction and a register) if you already know the precise offset
It can. As in \k{mixjump}, you need only prefix the address with the
\c{DWORD} keyword, and it will be forced to be a 32-bit address:
-\c mov dword [fs:dword my_offset],0x11223344
+\c mov dword [fs:dword my_offset],0x11223344
Also as in \k{mixjump}, NASM is not fussy about whether the
\c{DWORD} prefix comes before or after the segment override, so
arguably a nicer-looking way to code the above instruction is
-\c mov dword [dword fs:my_offset],0x11223344
+\c mov dword [dword fs:my_offset],0x11223344
Don't confuse the \c{DWORD} prefix \e{outside} the square brackets,
which controls the size of the data stored at the address, with the
one \c{inside} the square brackets which controls the length of the
address itself. The two can quite easily be different:
-\c mov word [dword 0x12345678],0x9ABC
+\c mov word [dword 0x12345678],0x9ABC
This moves 16 bits of data to an address specified by a 32-bit
offset.
You can also specify \c{WORD} or \c{DWORD} prefixes along with the
\c{FAR} prefix to indirect far jumps or calls. For example:
-\c call dword far [fs:word 0x4321]
+\c call dword far [fs:word 0x4321]
This instruction contains an address specified by a 16-bit offset;
it loads a 48-bit far pointer from that (16-bit segment and 32-bit
be accessing a string in a 32-bit segment, you should load the
desired address into \c{ESI} and then code
-\c a32 lodsb
+\c a32 lodsb
The prefix forces the addressing size to 32 bits, meaning that
\c{LODSB} loads from \c{[DS:ESI]} instead of \c{[DS:SI]}. To access
the 16-bit behaviour of segment-register push and pop instructions,
you can use the operand-size prefix \i\c{o16}:
-\c o16 push ss
-\c o16 push ds
+\c o16 push ss
+\c o16 push ds
This code saves a doubleword of stack space by fitting two segment
registers into the space which would normally be consumed by pushing
place the \c{0xAA55} signature word at the end of a 512-byte boot
sector, people who are used to MASM tend to code
-\c ORG 0
-\c ; some boot sector code
-\c ORG 510
-\c DW 0xAA55
+\c ORG 0
+\c
+\c ; some boot sector code
+\c
+\c ORG 510
+\c DW 0xAA55
This is not the intended use of the \c{ORG} directive in NASM, and
will not work. The correct way to solve this problem in NASM is to
use the \i\c{TIMES} directive, like this:
-\c ORG 0
-\c ; some boot sector code
-\c TIMES 510-($-$$) DB 0
-\c DW 0xAA55
+\c ORG 0
+\c
+\c ; some boot sector code
+\c
+\c TIMES 510-($-$$) DB 0
+\c DW 0xAA55
The \c{TIMES} directive will insert exactly enough zero bytes into
the output to move the assembly point up to 510. This method also
The other common problem with the above code is people who write the
\c{TIMES} line as
-\c TIMES 510-$ DB 0
+\c TIMES 510-$ DB 0
by reasoning that \c{$} should be a pure number, just like 510, so
the difference between them is also a pure number and can happily be
The solution, as in the previous section, is to code the \c{TIMES}
line in the form
-\c TIMES 510-($-$$) DB 0
+\c TIMES 510-($-$$) DB 0
in which \c{$} and \c{$$} are offsets from the same section base,
and so their difference is a pure number. This will solve the
The instructions that use this will give details of what the various
mnemonics are, this table is used to help you work out details of what
is happening.
-
-Predi- imm8 Description Relation where: Emula- Result if QNaN
- cate Encod- A Is 1st Operand tion NaN Signals
- ing B Is 2nd Operand Operand Invalid
-
-EQ 000B equal A = B False No
-
-LT 001B less-than A < B False Yes
-
-LE 010B less-than- A <= B False Yes
- or-equal
-
---- ---- greater A > B Swap False Yes
- than Operands,
- Use LT
-
---- ---- greater- A >= B Swap False Yes
- than-or-equal Operands,
- Use LE
-
-UNORD 011B unordered A, B = Unordered True No
-
-NEQ 100B not-equal A != B True No
-
-NLT 101B not-less- NOT(A < B) True Yes
- than
-
-NLE 110B not-less- NOT(A <= B) True Yes
- than-or-
- equal
-
---- ---- not-greater NOT(A > B) Swap True Yes
- than Operands,
- Use NLT
---- ---- not-greater NOT(A >= B) Swap True Yes
- than- Operands,
- or-equal Use NLE
-
-ORD 111B ordered A , B = Ordered False No
+\c Predi- imm8 Description Relation where: Emula- Result QNaN
+\c cate Encod- A Is 1st Operand tion if NaN Signal
+\c ing B Is 2nd Operand Operand Invalid
+\c
+\c EQ 000B equal A = B False No
+\c
+\c LT 001B less-than A < B False Yes
+\c
+\c LE 010B less-than- A <= B False Yes
+\c or-equal
+\c
+\c --- ---- greater A > B Swap False Yes
+\c than Operands,
+\c Use LT
+\c
+\c --- ---- greater- A >= B Swap False Yes
+\c than-or-equal Operands,
+\c Use LE
+\c
+\c UNORD 011B unordered A, B = Unordered True No
+\c
+\c NEQ 100B not-equal A != B True No
+\c
+\c NLT 101B not-less- NOT(A < B) True Yes
+\c than
+\c
+\c NLE 110B not-less- NOT(A <= B) True Yes
+\c than-or-
+\c equal
+\c
+\c --- ---- not-greater NOT(A > B) Swap True Yes
+\c than Operands,
+\c Use NLT
+\c
+\c --- ---- not-greater NOT(A >= B) Swap True Yes
+\c than- Operands,
+\c or-equal Use NLE
+\c
+\c ORD 111B ordered A , B = Ordered False No
The unordered relationship is true when at least one of the two
values being compared is a NaN or in an unsupported format.