From 6b2ea2eb8d2a5feffbef364e318bfc6455be02b3 Mon Sep 17 00:00:00 2001 From: Debbie Wiles Date: Fri, 10 May 2002 20:59:11 +0000 Subject: [PATCH] *** empty log message *** --- doc/nasmdoc.src | 1456 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 828 insertions(+), 628 deletions(-) diff --git a/doc/nasmdoc.src b/doc/nasmdoc.src index 5e29afe..131cf29 100644 --- a/doc/nasmdoc.src +++ b/doc/nasmdoc.src @@ -816,13 +816,13 @@ practical, for the user to look at a single line of NASM code and tell what opcode is generated by it. You can't do this in MASM: if you declare, for example, -\c foo equ 1 -\c bar dw 2 +\c foo equ 1 +\c bar dw 2 then the two lines of code -\c mov ax,foo -\c mov ax,bar +\c mov ax,foo +\c mov ax,bar generate completely different opcodes, despite having identical-looking syntaxes. @@ -1004,11 +1004,11 @@ can use NASM's native single-operand forms in most cases. Details of all forms of each supported instruction are given in \k{iref}. For example, you can code: -\c fadd st1 ; this sets st0 := st0 + st1 -\c fadd st0,st1 ; so does this +\c fadd st1 ; this sets st0 := st0 + st1 +\c fadd st0,st1 ; so does this \c -\c fadd st1,st0 ; this sets st1 := st1 + st0 -\c fadd to st1 ; so does this +\c fadd st1,st0 ; this sets st1 := st1 + st0 +\c fadd to st1 ; so does this Almost any floating-point instruction that references memory must use one of the prefixes \i\c{DWORD}, \i\c{QWORD} or \i\c{TWORD} to @@ -1033,18 +1033,18 @@ as in MASM, to declare initialised data in the output file. They can be invoked in a wide range of ways: \I{floating-point}\I{character constant}\I{string constant} -\c db 0x55 ; just the byte 0x55 -\c db 0x55,0x56,0x57 ; three bytes in succession -\c db 'a',0x55 ; character constants are OK -\c db 'hello',13,10,'$' ; so are string constants -\c dw 0x1234 ; 0x34 0x12 -\c dw 'a' ; 0x41 0x00 (it's just a number) -\c dw 'ab' ; 0x41 0x42 (character constant) -\c dw 'abc' ; 0x41 0x42 0x43 0x00 (string) -\c dd 0x12345678 ; 0x78 0x56 0x34 0x12 -\c dd 1.234567e20 ; floating-point constant -\c dq 1.234567e20 ; double-precision float -\c dt 1.234567e20 ; extended-precision float +\c db 0x55 ; just the byte 0x55 +\c db 0x55,0x56,0x57 ; three bytes in succession +\c db 'a',0x55 ; character constants are OK +\c db 'hello',13,10,'$' ; so are string constants +\c dw 0x1234 ; 0x34 0x12 +\c dw 'a' ; 0x41 0x00 (it's just a number) +\c dw 'ab' ; 0x41 0x42 (character constant) +\c dw 'abc' ; 0x41 0x42 0x43 0x00 (string) +\c dd 0x12345678 ; 0x78 0x56 0x34 0x12 +\c dd 1.234567e20 ; floating-point constant +\c dq 1.234567e20 ; double-precision float +\c dt 1.234567e20 ; extended-precision float \c{DQ} and \c{DT} do not accept \i{numeric constants} or string constants as operands. @@ -1064,9 +1064,9 @@ similar things: this is what it does instead. The operand to a For example: -\c buffer: resb 64 ; reserve 64 bytes -\c wordvar: resw 1 ; reserve a word -\c realarray resq 10 ; array of ten reals +\c buffer: resb 64 ; reserve 64 bytes +\c wordvar: resw 1 ; reserve a word +\c realarray resq 10 ; array of ten reals \S{incbin} \i\c{INCBIN}: Including External \i{Binary Files} @@ -1077,10 +1077,10 @@ handy for (for example) including \i{graphics} and \i{sound} data directly into a game executable file. It can be called in one of these three ways: -\c incbin "file.dat" ; include the whole file -\c incbin "file.dat",1024 ; skip the first 1024 bytes -\c incbin "file.dat",1024,512 ; skip the first 1024, and -\c ; actually include at most 512 +\c incbin "file.dat" ; include the whole file +\c incbin "file.dat",1024 ; skip the first 1024 bytes +\c incbin "file.dat",1024,512 ; skip the first 1024, and +\c ; actually include at most 512 \S{equ} \i\c{EQU}: Defining Constants @@ -1091,8 +1091,8 @@ to define the given label name to the value of its (only) operand. This definition is absolute, and cannot change later. So, for example, -\c message db 'hello, world' -\c msglen equ $-message +\c message db 'hello, world' +\c msglen equ $-message defines \c{msglen} to be the constant 12. \c{msglen} may not then be redefined later. This is not a \i{preprocessor} definition either: @@ -1111,20 +1111,20 @@ times. This is partly present as NASM's equivalent of the \i\c{DUP} syntax supported by \i{MASM}-compatible assemblers, in that you can code -\c zerobuf: times 64 db 0 +\c zerobuf: times 64 db 0 or similar things; but \c{TIMES} is more versatile than that. The argument to \c{TIMES} is not just a numeric constant, but a numeric \e{expression}, so you can do things like -\c buffer: db 'hello, world' -\c times 64-$+buffer db ' ' +\c buffer: db 'hello, world' +\c times 64-$+buffer db ' ' which will store exactly enough spaces to make the total length of \c{buffer} up to 64. Finally, \c{TIMES} can be applied to ordinary instructions, so you can code trivial \i{unrolled loops} in it: -\c times 100 movsb +\c times 100 movsb Note that there is no effective difference between \c{times 100 resb 1} and \c{resb 100}, except that the latter will be assembled about @@ -1148,10 +1148,10 @@ have a very simple syntax: they consist of an expression evaluating to the desired address, enclosed in \i{square brackets}. For example: -\c wordvar dw 123 -\c mov ax,[wordvar] -\c mov ax,[wordvar+1] -\c mov ax,[es:wordvar+bx] +\c wordvar dw 123 +\c mov ax,[wordvar] +\c mov ax,[wordvar+1] +\c mov ax,[es:wordvar+bx] Anything not conforming to this simple system is not a valid memory reference in NASM, for example \c{es:wordvar[bx]}. @@ -1159,15 +1159,15 @@ reference in NASM, for example \c{es:wordvar[bx]}. More complicated effective addresses, such as those involving more than one register, work in exactly the same way: -\c mov eax,[ebx*2+ecx+offset] -\c mov ax,[bp+di+8] +\c mov eax,[ebx*2+ecx+offset] +\c mov ax,[bp+di+8] NASM is capable of doing \i{algebra} on these effective addresses, so that things which don't necessarily \e{look} legal are perfectly all right: -\c mov eax,[ebx*5] ; assembles as [ebx*4+ebx] -\c mov eax,[label1*2-label2] ; ie [label1+(label1-label2)] +\c mov eax,[ebx*5] ; assembles as [ebx*4+ebx] +\c mov eax,[label1*2-label2] ; ie [label1+(label1-label2)] Some forms of effective address have more than one assembled form; in most such cases NASM will generate the smallest form it can. For @@ -1220,12 +1220,12 @@ sign must have a digit after the \c{$} rather than a letter. Some examples: -\c mov ax,100 ; decimal -\c mov ax,0a2h ; hex -\c mov ax,$0a2 ; hex again: the 0 is required -\c mov ax,0xa2 ; hex yet again -\c mov ax,777q ; octal -\c mov ax,10010011b ; binary +\c mov ax,100 ; decimal +\c mov ax,0a2h ; hex +\c mov ax,$0a2 ; hex again: the 0 is required +\c mov ax,0xa2 ; hex yet again +\c mov ax,777q ; octal +\c mov ax,10010011b ; binary \S{chrconst} \i{Character Constants} @@ -1258,14 +1258,14 @@ A string constant looks like a character constant, only longer. It is treated as a concatenation of maximum-size character constants for the conditions. So the following are equivalent: -\c db 'hello' ; string constant -\c db 'h','e','l','l','o' ; equivalent character constants +\c db 'hello' ; string constant +\c db 'h','e','l','l','o' ; equivalent character constants And the following are also equivalent: -\c dd 'ninechars' ; doubleword string constant -\c dd 'nine','char','s' ; becomes three doublewords -\c db 'ninechars',0,0,0 ; and really looks like this +\c dd 'ninechars' ; doubleword string constant +\c dd 'nine','char','s' ; becomes three doublewords +\c db 'ninechars',0,0,0 ; and really looks like this Note that when used as an operand to \c{db}, a constant like \c{'ab'} is treated as a string constant despite being short enough @@ -1287,11 +1287,11 @@ floating-point constant. Some examples: -\c dd 1.2 ; an easy one -\c dq 1.e10 ; 10,000,000,000 -\c dq 1.e+10 ; synonymous with 1.e10 -\c dq 1.e-10 ; 0.000 000 000 1 -\c dt 3.141592653589793238462 ; pi +\c dd 1.2 ; an easy one +\c dq 1.e10 ; 10,000,000,000 +\c dq 1.e+10 ; synonymous with 1.e10 +\c dq 1.e-10 ; 0.000 000 000 1 +\c dt 3.141592653589793238462 ; pi NASM cannot do compile-time arithmetic on floating-point constants. This is because NASM is designed to be portable - although it always @@ -1400,9 +1400,9 @@ The \c{SEG} operator returns the \i\e{preferred} segment base of a symbol, defined as the segment base relative to which the offset of the symbol makes sense. So the code -\c mov ax,seg symbol -\c mov es,ax -\c mov bx,symbol +\c mov ax,seg symbol +\c mov es,ax +\c mov bx,symbol will load \c{ES:BX} with a valid pointer to the symbol \c{symbol}. @@ -1412,9 +1412,9 @@ want to refer to some symbol using a different segment base from the preferred one. NASM lets you do this, by the use of the \c{WRT} (With Reference To) keyword. So you can do things like -\c mov ax,weird_seg ; weird_seg is a segment base -\c mov es,ax -\c mov bx,symbol wrt weird_seg +\c mov ax,weird_seg ; weird_seg is a segment base +\c mov es,ax +\c mov bx,symbol wrt weird_seg to load \c{ES:BX} with a different, but functionally equivalent, pointer to the symbol \c{symbol}. @@ -1424,8 +1424,8 @@ syntax \c{call segment:offset}, where \c{segment} and \c{offset} both represent immediate values. So to call a far procedure, you could code either of -\c call (seg procedure):procedure -\c call weird_seg:(procedure wrt weird_seg) +\c call (seg procedure):procedure +\c call weird_seg:(procedure wrt weird_seg) (The parentheses are included for clarity, to show the intended parsing of the above instructions. They are not necessary in @@ -1438,7 +1438,7 @@ to \c{CALL} in these examples. To declare a \i{far pointer} to a data item in a data segment, you must code -\c dw symbol, seg symbol +\c dw symbol, seg symbol NASM supports no convenient synonym for this, though you can always invent one using the macro processor. @@ -1457,8 +1457,8 @@ code, knows all the symbol addresses the code refers to. So one thing NASM can't handle is code whose size depends on the value of a symbol declared after the code in question. For example, -\c times (label-$) db 0 -\c label: db 'Where am I?' +\c times (label-$) db 0 +\c label: db 'Where am I?' The argument to \i\c{TIMES} in this case could equally legally evaluate to anything at all; NASM will reject this example because @@ -1466,8 +1466,8 @@ it cannot tell the size of the \c{TIMES} line when it first sees it. It will just as firmly reject the slightly \I{paradox}paradoxical code -\c times (label-$+1) db 0 -\c label: db 'NOW where am I?' +\c times (label-$+1) db 0 +\c label: db 'NOW where am I?' in which \e{any} value for the \c{TIMES} argument is by definition wrong! @@ -1483,8 +1483,8 @@ also critical expressions. Critical expressions can crop up in other contexts as well: consider the following code. -\c mov ax,symbol1 -\c symbol1 equ symbol2 +\c mov ax,symbol1 +\c symbol1 equ symbol2 \c symbol2: On the first pass, NASM cannot determine the value of \c{symbol1}, @@ -1502,8 +1502,8 @@ NASM avoids this problem by defining the right-hand side of an There is a related issue involving \i{forward references}: consider this code fragment. -\c mov eax,[ebx+offset] -\c offset equ 10 +\c mov eax,[ebx+offset] +\c offset equ 10 NASM, on pass one, must calculate the size of the instruction \c{mov eax,[ebx+offset]} without knowing the value of \c{offset}. It has no @@ -1528,14 +1528,21 @@ A label beginning with a single period is treated as a \e{local} label, which means that it is associated with the previous non-local label. So, for example: -\c label1 ; some code -\c .loop ; some more code -\c jne .loop -\c ret -\c label2 ; some code -\c .loop ; some more code -\c jne .loop -\c ret +\c label1 ; some code +\c +\c .loop +\c ; some more code +\c +\c jne .loop +\c ret +\c +\c label2 ; some code +\c +\c .loop +\c ; some more code +\c +\c jne .loop +\c ret In the above code fragment, each \c{JNE} instruction jumps to the line immediately before it, because the two definitions of \c{.loop} @@ -1551,9 +1558,10 @@ really defining a symbol called \c{label1.loop}, and the second defines a symbol called \c{label2.loop}. So, if you really needed to, you could write -\c label3 ; some more code -\c ; and some more -\c jmp label1.loop +\c label3 ; some more code +\c ; and some more +\c +\c jmp label1.loop Sometimes it is useful - in a macro, for instance - to be able to define a label which can be referenced from anywhere but which @@ -1566,12 +1574,13 @@ probably only useful in macro definitions: if a label begins with the \I{label prefix}special prefix \i\c{..@}, then it does nothing to the local label mechanism. So you could code -\c label1: ; a non-local label -\c .local: ; this is really label1.local -\c ..@foo: ; this is a special symbol -\c label2: ; another non-local label -\c .local: ; this is really label2.local -\c jmp ..@foo ; this will jump three lines up +\c label1: ; a non-local label +\c .local: ; this is really label1.local +\c ..@foo: ; this is a special symbol +\c label2: ; another non-local label +\c .local: ; this is really label2.local +\c +\c jmp ..@foo ; this will jump three lines up NASM has the capacity to define other special symbols beginning with a double period: for example, \c{..start} is used to specify the @@ -1602,21 +1611,23 @@ Single-line macros are defined using the \c{%define} preprocessor directive. The definitions work in a similar way to C; so you can do things like -\c %define ctrl 0x1F & +\c %define ctrl 0x1F & \c %define param(a,b) ((a)+(a)*(b)) -\c mov byte [param(2,ebx)], ctrl 'D' +\c +\c mov byte [param(2,ebx)], ctrl 'D' which will expand to -\c mov byte [(2)+(2)*(ebx)], 0x1F & 'D' +\c mov byte [(2)+(2)*(ebx)], 0x1F & 'D' When the expansion of a single-line macro contains tokens which invoke another macro, the expansion is performed at invocation time, not at definition time. Thus the code -\c %define a(x) 1+b(x) -\c %define b(x) 2*x -\c mov ax,a(8) +\c %define a(x) 1+b(x) +\c %define b(x) 2*x +\c +\c mov ax,a(8) will evaluate in the expected way to \c{mov ax,1+2*8}, even though the macro \c{b} wasn't defined at the time of definition of \c{a}. @@ -1635,8 +1646,9 @@ a result of a previous expansion of the same macro, to guard against preprocessor will only expand the first occurrence of the macro. Hence, if you code -\c %define a(x) 1+a(x) -\c mov ax,a(3) +\c %define a(x) 1+a(x) +\c +\c mov ax,a(3) the macro \c{a(3)} will expand once, becoming \c{1+a(3)}, and will then expand no further. This behaviour can be useful: see \k{32c} @@ -1645,7 +1657,7 @@ for an example of its use. You can \I{overloading, single-line macros}overload single-line macros: if you write -\c %define foo(x) 1+x +\c %define foo(x) 1+x \c %define foo(x,y) 1+x*y the preprocessor will be able to handle both types of macro call, @@ -1684,31 +1696,31 @@ several similar macros that perform simlar functions. As an example, consider the following: -\c %define BDASTART 400h ; Start of BIOS data area +\c %define BDASTART 400h ; Start of BIOS data area -\c struc tBIOSDA ; its structure -\c .COM1addr RESW 1 -\c .COM2addr RESW 1 -\c ; ..and so on +\c struc tBIOSDA ; its structure +\c .COM1addr RESW 1 +\c .COM2addr RESW 1 +\c ; ..and so on \c endstruc Now, if we need to access the elements of tBIOSDA in different places, we can end up with: -\c mov ax,BDASTART + tBIOSDA.COM1addr -\c mov bx,BDASTART + tBIOSDA.COM2addr +\c mov ax,BDASTART + tBIOSDA.COM1addr +\c mov bx,BDASTART + tBIOSDA.COM2addr This will become pretty ugly (and tedious) if used in many places, and can be reduced in size significantly by using the following macro: \c ; Macro to access BIOS variables by their names (from tBDA): -\c %define BDA(x) BDASTART + tBIOSDA. %+ x +\c %define BDA(x) BDASTART + tBIOSDA. %+ x Now the above code can be written as: -\c mov ax,BDA(COM1addr) -\c mov bx,BDA(COM2addr) +\c mov ax,BDA(COM1addr) +\c mov bx,BDA(COM2addr) Using this feature, we can simplify references to a lot of macros (and, in turn, reduce typing errors). @@ -1721,7 +1733,8 @@ example, the following sequence: \c %define foo bar \c %undef foo -\c mov eax, foo +\c +\c mov eax, foo will expand to the instruction \c{mov eax, foo}, since after \c{%undef} the macro \c{foo} is no longer defined. @@ -1793,9 +1806,9 @@ assigned the value of 8. Individual letters in strings can be extracted using \c{%substr}. An example of its use is probably more useful than the description: -\c %substr mychar 'xyz' 1 ; equivalent to %define mychar 'x' -\c %substr mychar 'xyz' 2 ; equivalent to %define mychar 'y' -\c %substr mychar 'xyz' 3 ; equivalent to %define mychar 'z' +\c %substr mychar 'xyz' 1 ; equivalent to %define mychar 'x' +\c %substr mychar 'xyz' 2 ; equivalent to %define mychar 'y' +\c %substr mychar 'xyz' 3 ; equivalent to %define mychar 'z' In this example, mychar gets the value of 'y'. As with \c{%strlen} (see \k{strlen}), the first parameter is the single-line macro to @@ -1812,10 +1825,12 @@ Multi-line macros are much more like the type of macro seen in MASM and TASM: a multi-line macro definition in NASM looks something like this. -\c %macro prologue 1 -\c push ebp -\c mov ebp,esp -\c sub esp,%1 +\c %macro prologue 1 +\c +\c push ebp +\c mov ebp,esp +\c sub esp,%1 +\c \c %endmacro This defines a C-like function prologue as a macro: so you would @@ -1825,9 +1840,9 @@ invoke the macro with a call such as which would expand to the three lines of code -\c myfunc: push ebp -\c mov ebp,esp -\c sub esp,12 +\c myfunc: push ebp +\c mov ebp,esp +\c sub esp,12 The number \c{1} after the macro name in the \c{%macro} line defines the number of parameters the macro \c{prologue} expects to receive. @@ -1844,12 +1859,15 @@ multi-line macro, you can do that by enclosing the entire parameter in \I{braces, around macro parameters}braces. So you could code things like -\c %macro silly 2 -\c %2: db %1 +\c %macro silly 2 +\c +\c %2: db %1 +\c \c %endmacro -\c silly 'a', letter_a ; letter_a: db 'a' -\c silly 'ab', string_ab ; string_ab: db 'ab' -\c silly {13,10}, crlf ; crlf: db 13,10 +\c +\c silly 'a', letter_a ; letter_a: db 'a' +\c silly 'ab', string_ab ; string_ab: db 'ab' +\c silly {13,10}, crlf ; crlf: db 13,10 \S{mlmacover} \i{Overloading Multi-Line Macros} @@ -1859,9 +1877,11 @@ defining the same macro name several times with different numbers of parameters. This time, no exception is made for macros with no parameters at all. So you could define -\c %macro prologue 0 -\c push ebp -\c mov ebp,esp +\c %macro prologue 0 +\c +\c push ebp +\c mov ebp,esp +\c \c %endmacro to define an alternative form of the function prologue which @@ -1870,15 +1890,17 @@ allocates no local stack space. Sometimes, however, you might want to `overload' a machine instruction; for example, you might want to define -\c %macro push 2 -\c push %1 -\c push %2 +\c %macro push 2 +\c +\c push %1 +\c push %2 +\c \c %endmacro so that you could code -\c push ebx ; this line is not a macro call -\c push eax,ecx ; but this one is +\c push ebx ; this line is not a macro call +\c push eax,ecx ; but this one is Ordinarily, NASM will give a warning for the first of the above two lines, since \c{push} is now defined to be a macro, and is being @@ -1897,10 +1919,12 @@ each time. You do this by prefixing \i\c{%%} to the label name. So you can invent an instruction which executes a \c{RET} if the \c{Z} flag is set by doing this: -\c %macro retz 0 -\c jnz %%skip -\c ret -\c %%skip: +\c %macro retz 0 +\c +\c jnz %%skip +\c ret +\c %%skip: +\c \c %endmacro You can call this macro as many times as you want, and every time @@ -1922,7 +1946,7 @@ extracting one or two smaller parameters from the front. An example might be a macro to write a text string to a file in MS-DOS, where you might want to be able to write -\c writefile [filehandle],"hello, world",13,10 +\c writefile [filehandle],"hello, world",13,10 NASM allows you to define the last parameter of a macro to be \e{greedy}, meaning that if you invoke the macro with more @@ -1930,14 +1954,17 @@ parameters than it expects, all the spare parameters get lumped into the last defined one along with the separating commas. So if you code: -\c %macro writefile 2+ -\c jmp %%endstr -\c %%str: db %2 -\c %%endstr: mov dx,%%str -\c mov cx,%%endstr-%%str -\c mov bx,%1 -\c mov ah,0x40 -\c int 0x21 +\c %macro writefile 2+ +\c +\c jmp %%endstr +\c %%str: db %2 +\c %%endstr: +\c mov dx,%%str +\c mov cx,%%endstr-%%str +\c mov bx,%1 +\c mov ah,0x40 +\c int 0x21 +\c \c %endmacro then the example call to \c{writefile} above will work as expected: @@ -1978,10 +2005,12 @@ NASM also allows you to define a multi-line macro with a \e{range} of allowable parameter counts. If you do this, you can specify defaults for \i{omitted parameters}. So, for example: -\c %macro die 0-1 "Painful program death has occurred." -\c writefile 2,%1 -\c mov ax,0x4c01 -\c int 0x21 +\c %macro die 0-1 "Painful program death has occurred." +\c +\c writefile 2,%1 +\c mov ax,0x4c01 +\c int 0x21 +\c \c %endmacro This macro (which makes use of the \c{writefile} macro defined in @@ -2050,11 +2079,13 @@ parameters are rotated to the right. \I{iterating over macro parameters}So a pair of macros to save and restore a set of registers might work as follows: -\c %macro multipush 1-* -\c %rep %0 -\c push %1 -\c %rotate 1 -\c %endrep +\c %macro multipush 1-* +\c +\c %rep %0 +\c push %1 +\c %rotate 1 +\c %endrep +\c \c %endmacro This macro invokes the \c{PUSH} instruction on each of its arguments @@ -2079,11 +2110,13 @@ order from the one in which they were pushed. This can be done by the following definition: -\c %macro multipop 1-* -\c %rep %0 -\c %rotate -1 -\c pop %1 -\c %endrep +\c %macro multipop 1-* +\c +\c %rep %0 +\c %rotate -1 +\c pop %1 +\c %endrep +\c \c %endmacro This macro begins by rotating its arguments one place to the @@ -2102,9 +2135,12 @@ table of key codes along with offsets into the table, you could code something like \c %macro keytab_entry 2 -\c keypos%1 equ $-keytab -\c db %2 +\c +\c keypos%1 equ $-keytab +\c db %2 +\c \c %endmacro +\c \c keytab: \c keytab_entry F1,128+1 \c keytab_entry F2,128+2 @@ -2113,12 +2149,12 @@ something like which would expand to \c keytab: -\c keyposF1 equ $-keytab -\c db 128+1 -\c keyposF2 equ $-keytab -\c db 128+2 -\c keyposReturn equ $-keytab -\c db 13 +\c keyposF1 equ $-keytab +\c db 128+1 +\c keyposF2 equ $-keytab +\c db 128+2 +\c keyposReturn equ $-keytab +\c db 13 You can just as easily concatenate text on to the other end of a macro parameter, by writing \c{%1foo}. @@ -2158,10 +2194,12 @@ means of \i\c{%-1}, which NASM will expand as the \e{inverse} condition code. So the \c{retz} macro defined in \k{maclocal} can be replaced by a general \i{conditional-return macro} like this: -\c %macro retc 1 -\c j%-1 %%skip -\c ret -\c %%skip: +\c %macro retc 1 +\c +\c j%-1 %%skip +\c ret +\c %%skip: +\c \c %endmacro This macro can now be invoked using calls like \c{retc ne}, which @@ -2202,11 +2240,11 @@ file to be assembled only if certain conditions are met. The general syntax of this feature looks like this: \c %if -\c ; some code which only appears if is met +\c ; some code which only appears if is met \c %elif -\c ; only appears if is not met but is +\c ; only appears if is not met but is \c %else -\c ; this appears if neither nor was met +\c ; this appears if neither nor was met \c %endif The \i\c{%else} clause is optional, as is the \i\c{%elif} clause. @@ -2239,6 +2277,42 @@ definitions in \c{%elif} blocks by using \i\c{%elifdef} and \i\c{%elifndef}. +\S{ifmacro} \i\c{ifmacro}: \i{Testing Multi-Line Macro Existence} + +The \c{%ifmacro} directive oeprates in the same way as the \c{%ifdef} +directive, except that it checks for the existence of a multi-line macro. + +For example, you may be working with a large project and not have control +over the macros in a library. You may want to create a macro with one +name if it doesn't already exist, and another name if one with that name +does exist. + +The %ifmacro is considered true if defining a macro with the given name +and number of arguements would cause a definitions conflict. For example: + +\c %ifmacro MyMacro 1-3 +\c +\c %error "MyMacro 1-3" causes a conflict with an existing macro. +\c +\c %else +\c +\c %macro MyMacro 1-3 +\c +\c ; insert code to define the macro +\c +\c %endmacro +\c +\c %endif + +This will create the macro "MyMacro 1-3" if no macro already exists which +would conflict with it, and emits a warning if there would be a definition +conflict. + +You can test for the macro not existing by using the \i\c{ifnmacro} instead +of \c{ifmacro}. Additional tests can be performed in %elif blocks by using +\i\c{elifmacro} and \i\c{elifnmacro}. + + \S{ifctx} \i\c{%ifctx}: \i{Testing the Context Stack} The conditional-assembly construct \c{%ifctx ctxname} will cause the @@ -2291,13 +2365,15 @@ Differences in white space are not counted. For example, the following macro pushes a register or number on the stack, and allows you to treat \c{IP} as a real register: -\c %macro pushparam 1 -\c %ifidni %1,ip -\c call %%label -\c %%label: -\c %else -\c push %1 -\c %endif +\c %macro pushparam 1 +\c +\c %ifidni %1,ip +\c call %%label +\c %%label: +\c %else +\c push %1 +\c %endif +\c \c %endmacro Like most other \c{%if} constructs, \c{%ifidn} has a counterpart @@ -2325,29 +2401,31 @@ For example, the \c{writefile} macro defined in \k{mlmacgre} can be extended to take advantage of \c{%ifstr} in the following fashion: \c %macro writefile 2-3+ -\c %ifstr %2 -\c jmp %%endstr -\c %if %0 = 3 -\c %%str: db %2,%3 -\c %else -\c %%str: db %2 -\c %endif -\c %%endstr: mov dx,%%str -\c mov cx,%%endstr-%%str -\c %else -\c mov dx,%2 -\c mov cx,%3 -\c %endif -\c mov bx,%1 -\c mov ah,0x40 -\c int 0x21 +\c +\c %ifstr %2 +\c jmp %%endstr +\c %if %0 = 3 +\c %%str: db %2,%3 +\c %else +\c %%str: db %2 +\c %endif +\c %%endstr: mov dx,%%str +\c mov cx,%%endstr-%%str +\c %else +\c mov dx,%2 +\c mov cx,%3 +\c %endif +\c mov bx,%1 +\c mov ah,0x40 +\c int 0x21 +\c \c %endmacro Then the \c{writefile} macro can cope with being called in either of the following two ways: -\c writefile [file], strpointer, length -\c writefile [file], "hello", 13, 10 +\c writefile [file], strpointer, length +\c writefile [file], "hello", 13, 10 In the first, \c{strpointer} is used as the address of an already-declared string, and \c{length} is used as its length; in @@ -2374,11 +2452,11 @@ try to assemble your source files, you can ensure that they define the right macros by means of code like this: \c %ifdef SOME_MACRO -\c ; do some setup +\c ; do some setup \c %elifdef SOME_OTHER_MACRO -\c ; do some different setup +\c ; do some different setup \c %else -\c %error Neither SOME_MACRO nor SOME_OTHER_MACRO was defined. +\c %error Neither SOME_MACRO nor SOME_OTHER_MACRO was defined. \c %endif Then any user who fails to understand the way your code is supposed @@ -2400,8 +2478,8 @@ arguments) can be used to enclose a chunk of code, which is then replicated as many times as specified by the preprocessor: \c %assign i 0 -\c %rep 64 -\c inc word [table+2*i] +\c %rep 64 +\c inc word [table+2*i] \c %assign i i+1 \c %endrep @@ -2418,13 +2496,14 @@ terminate the loop, like this: \c %assign j 1 \c %rep 100 \c %if j > 65535 -\c %exitrep +\c %exitrep \c %endif -\c dw j +\c dw j \c %assign k j+i \c %assign i j \c %assign j k \c %endrep +\c \c fib_number equ ($-fibonacci)/2 This produces a list of all the Fibonacci numbers that will fit in @@ -2457,8 +2536,8 @@ once is just as applicable in NASM: if the file \c{macros.mac} has the form \c %ifndef MACROS_MAC -\c %define MACROS_MAC -\c ; now define some macros +\c %define MACROS_MAC +\c ; now define some macros \c %endif then including the file more than once will not cause errors, @@ -2494,7 +2573,7 @@ The \c{%push} directive is used to create a new context and place it on the top of the context stack. \c{%push} requires one argument, which is the name of the context. For example: -\c %push foobar +\c %push foobar This pushes a new context called \c{foobar} on the stack. You can have several contexts on the stack with the same name: they can @@ -2514,22 +2593,26 @@ of the context stack. So the \c{REPEAT} and \c{UNTIL} example given above could be implemented by means of: \c %macro repeat 0 -\c %push repeat -\c %$begin: +\c +\c %push repeat +\c %$begin: +\c \c %endmacro - +\c \c %macro until 1 -\c j%-1 %$begin -\c %pop +\c +\c j%-1 %$begin +\c %pop +\c \c %endmacro and invoked by means of, for example, -\c mov cx,string -\c repeat -\c add cx,3 -\c scasb -\c until e +\c mov cx,string +\c repeat +\c add cx,3 +\c scasb +\c until e which would scan every fourth byte of a string in search of the byte in \c{AL}. @@ -2564,7 +2647,7 @@ with a different name, without touching the associated macros and labels. So you could replace the destructive code \c %pop -\c %push newname +\c %push newname with the non-destructive version \c{%repl newname}. @@ -2576,30 +2659,36 @@ including the conditional-assembly construct \i\c{%ifctx}, to implement a block IF statement as a set of macros. \c %macro if 1 +\c \c %push if -\c j%-1 %$ifnot +\c j%-1 %$ifnot +\c \c %endmacro - +\c \c %macro else 0 -\c %ifctx if -\c %repl else -\c jmp %$ifend +\c +\c %ifctx if +\c %repl else +\c jmp %$ifend \c %$ifnot: -\c %else -\c %error "expected `if' before `else'" -\c %endif +\c %else +\c %error "expected `if' before `else'" +\c %endif +\c \c %endmacro - +\c \c %macro endif 0 -\c %ifctx if +\c +\c %ifctx if \c %$ifnot: \c %pop -\c %elifctx else +\c %elifctx else \c %$ifend: \c %pop -\c %else -\c %error "expected `if' or `else' before `endif'" -\c %endif +\c %else +\c %error "expected `if' or `else' before `endif'" +\c %endif +\c \c %endmacro This code is more robust than the \c{REPEAT} and \c{UNTIL} macros @@ -2622,20 +2711,25 @@ intervening \c{else}. It does this by the use of \c{%repl}. A sample usage of these macros might look like: -\c cmp ax,bx -\c if ae -\c cmp bx,cx -\c if ae -\c mov ax,cx -\c else -\c mov ax,bx -\c endif -\c else -\c cmp ax,cx -\c if ae -\c mov ax,cx -\c endif -\c endif +\c cmp ax,bx +\c +\c if ae +\c cmp bx,cx +\c +\c if ae +\c mov ax,cx +\c else +\c mov ax,bx +\c endif +\c +\c else +\c cmp ax,cx +\c +\c if ae +\c mov ax,cx +\c endif +\c +\c endif The block-\c{IF} macros handle nesting quite happily, by means of pushing another context, describing the inner \c{if}, on top of the @@ -2684,11 +2778,13 @@ example, one could write a routine \c{stillhere}, which is passed a line number in \c{EAX} and outputs something like `line 155: still here'. You could then write a macro -\c %macro notdeadyet 0 -\c push eax -\c mov eax,__LINE__ -\c call stillhere -\c pop eax +\c %macro notdeadyet 0 +\c +\c push eax +\c mov eax,__LINE__ +\c call stillhere +\c pop eax +\c \c %endmacro and then pepper your code with calls to \c{notdeadyet} until you @@ -2713,12 +2809,14 @@ using the \c{RESB} family of pseudo-instructions, and then invoke For example, to define a structure called \c{mytype} containing a longword, a word, a byte and a string of bytes, you might code -\c struc mytype -\c mt_long: resd 1 -\c mt_word: resw 1 -\c mt_byte: resb 1 -\c mt_str: resb 32 -\c endstruc +\c struc mytype +\c +\c mt_long: resd 1 +\c mt_word: resw 1 +\c mt_byte: resb 1 +\c mt_str: resb 32 +\c +\c endstruc The above code defines six symbols: \c{mt_long} as 0 (the offset from the beginning of a \c{mytype} structure to the longword field), @@ -2730,12 +2828,14 @@ effect of allowing structures to work with the local label mechanism: if your structure members tend to have the same names in more than one structure, you can define the above structure like this: -\c struc mytype -\c .long: resd 1 -\c .word: resw 1 -\c .byte: resb 1 -\c .str: resb 32 -\c endstruc +\c struc mytype +\c +\c .long: resd 1 +\c .word: resw 1 +\c .byte: resb 1 +\c .str: resb 32 +\c +\c endstruc This defines the offsets to the structure fields as \c{mytype.long}, \c{mytype.word}, \c{mytype.byte} and \c{mytype.str}. @@ -2758,12 +2858,15 @@ segment. NASM provides an easy way to do this in the \c{ISTRUC} mechanism. To declare a structure of type \c{mytype} in a program, you code something like this: -\c mystruc: istruc mytype -\c at mt_long, dd 123456 -\c at mt_word, dw 1024 -\c at mt_byte, db 'x' -\c at mt_str, db 'hello, world', 13, 10, 0 -\c iend +\c mystruc: +\c istruc mytype +\c +\c at mt_long, dd 123456 +\c at mt_word, dw 1024 +\c at mt_byte, db 'x' +\c at mt_str, db 'hello, world', 13, 10, 0 +\c +\c iend The function of the \c{AT} macro is to make use of the \c{TIMES} prefix to advance the assembly position to the correct point for the @@ -2775,16 +2878,16 @@ If the data to go in a structure field requires more than one source line to specify, the remaining source lines can easily come after the \c{AT} line. For example: -\c at mt_str, db 123,134,145,156,167,178,189 -\c db 190,100,0 +\c at mt_str, db 123,134,145,156,167,178,189 +\c db 190,100,0 Depending on personal taste, you can also omit the code part of the \c{AT} line completely, and start the structure field on the next line: -\c at mt_str -\c db 'hello, world' -\c db 13,10,0 +\c at mt_str +\c db 'hello, world' +\c db 13,10,0 \S{align} \i\c{ALIGN} and \i\c{ALIGNB}: Data Alignment @@ -2794,11 +2897,11 @@ align code or data on a word, longword, paragraph or other boundary. (Some assemblers call this directive \i\c{EVEN}.) The syntax of the \c{ALIGN} and \c{ALIGNB} macros is -\c align 4 ; align on 4-byte boundary -\c align 16 ; align on 16-byte boundary -\c align 8,db 0 ; pad with 0s rather than NOPs -\c align 4,resb 1 ; align to 4 in the BSS -\c alignb 4 ; equivalent to previous line +\c align 4 ; align on 4-byte boundary +\c align 16 ; align on 16-byte boundary +\c align 8,db 0 ; pad with 0s rather than NOPs +\c align 4,resb 1 ; align to 4 in the BSS +\c alignb 4 ; equivalent to previous line Both macros require their first argument to be a power of two; they both compute the number of additional bytes required to bring the @@ -2822,14 +2925,20 @@ thing. \c{ALIGNB} (or \c{ALIGN} with a second argument of \c{RESB 1}) can be used within structure definitions: -\c struc mytype2 -\c mt_byte: resb 1 -\c alignb 2 -\c mt_word: resw 1 -\c alignb 4 -\c mt_long: resd 1 -\c mt_str: resb 32 -\c endstruc +\c struc mytype2 +\c +\c mt_byte: +\c resb 1 +\c alignb 2 +\c mt_word: +\c resw 1 +\c alignb 4 +\c mt_long: +\c resd 1 +\c mt_str: +\c resb 32 +\c +\c endstruc This will ensure that the structure members are sensibly aligned relative to the base of the structure. @@ -2868,14 +2977,17 @@ convenient to use and is not TASM compatible. Here is an example which shows the use of \c{%arg} without any external macros: \c some_function: -\c %push mycontext ; save the current context -\c %stacksize large ; tell NASM to use bp -\c %arg i:word, j_ptr:word -\c mov ax,[i] -\c mov bx,[j_ptr] -\c add ax,[bx] -\c ret -\c %pop ; restore original context +\c +\c %push mycontext ; save the current context +\c %stacksize large ; tell NASM to use bp +\c %arg i:word, j_ptr:word +\c +\c mov ax,[i] +\c mov bx,[j_ptr] +\c add ax,[bx] +\c ret +\c +\c %pop ; restore original context This is similar to the procedure defined in \k{16cmacro} and adds the value in i to the value pointed to by j_ptr and returns the @@ -2928,20 +3040,23 @@ instruction (see \k{insENTER} for a description of that instruction). An example of its use is the following: \c silly_swap: -\c %push mycontext ; save the current context -\c %stacksize small ; tell NASM to use bp -\c %assign %$localsize 0 ; see text for explanation -\c %local old_ax:word, old_dx:word -\c enter %$localsize,0 ; see text for explanation -\c mov [old_ax],ax ; swap ax & bx -\c mov [old_dx],dx ; and swap dx & cx -\c mov ax,bx -\c mov dx,cx -\c mov bx,[old_ax] -\c mov cx,[old_dx] -\c leave ; restore old bp -\c ret ; -\c %pop ; restore original context +\c +\c %push mycontext ; save the current context +\c %stacksize small ; tell NASM to use bp +\c %assign %$localsize 0 ; see text for explanation +\c %local old_ax:word, old_dx:word +\c +\c enter %$localsize,0 ; see text for explanation +\c mov [old_ax],ax ; swap ax & bx +\c mov [old_dx],dx ; and swap dx & cx +\c mov ax,bx +\c mov dx,cx +\c mov bx,[old_ax] +\c mov cx,[old_dx] +\c leave ; restore old bp +\c ret ; +\c +\c %pop ; restore original context The \c{%$localsize} variable is used internally by the \c{%local} directive and \e{must} be defined within the @@ -3049,27 +3164,32 @@ defines the single-line macro \c{__SECT__} to be the primitive \c{[SECTION]} directive which it is about to issue, and then issues it. So the user-level directive -\c SECTION .text +\c SECTION .text expands to the two lines -\c %define __SECT__ [SECTION .text] -\c [SECTION .text] +\c %define __SECT__ [SECTION .text] +\c [SECTION .text] Users may find it useful to make use of this in their own macros. For example, the \c{writefile} macro defined in \k{mlmacgre} can be usefully rewritten in the following more sophisticated form: -\c %macro writefile 2+ -\c [section .data] -\c %%str: db %2 -\c %%endstr: -\c __SECT__ -\c mov dx,%%str -\c mov cx,%%endstr-%%str -\c mov bx,%1 -\c mov ah,0x40 -\c int 0x21 +\c %macro writefile 2+ +\c +\c [section .data] +\c +\c %%str: db %2 +\c %%endstr: +\c +\c __SECT__ +\c +\c mov dx,%%str +\c mov cx,%%endstr-%%str +\c mov bx,%1 +\c mov ah,0x40 +\c int 0x21 +\c \c %endmacro This form of the macro, once passed a string to output, first @@ -3094,10 +3214,11 @@ mode are the \c{RESB} family. \c{ABSOLUTE} is used as follows: -\c absolute 0x1A -\c kbuf_chr resw 1 -\c kbuf_free resw 1 -\c kbuf resw 16 +\c absolute 0x1A +\c +\c kbuf_chr resw 1 +\c kbuf_free resw 1 +\c kbuf resw 16 This example describes a section of the PC BIOS data area, at segment address 0x40: the above code defines \c{kbuf_chr} to be @@ -3114,13 +3235,19 @@ argument: it can take an expression (actually, a \i{critical expression}: see \k{crit}) and it can be a value in a segment. For example, a TSR can re-use its setup code as run-time BSS like this: -\c org 100h ; it's a .COM program -\c jmp setup ; setup code comes last -\c ; the resident part of the TSR goes here -\c setup: ; now write the code that installs the TSR here -\c absolute setup -\c runtimevar1 resw 1 -\c runtimevar2 resd 20 +\c org 100h ; it's a .COM program +\c +\c jmp setup ; setup code comes last +\c +\c ; the resident part of the TSR goes here +\c setup: +\c ; now write the code that installs the TSR here +\c +\c absolute setup +\c +\c runtimevar1 resw 1 +\c runtimevar2 resd 20 +\c \c tsr_end: This defines some variables `on top of' the setup code, so that @@ -3142,8 +3269,8 @@ the \c{bin} format cannot. The \c{EXTERN} directive takes as many arguments as you like. Each argument is the name of a symbol: -\c extern _printf -\c extern _sscanf,_fscanf +\c extern _printf +\c extern _sscanf,_fscanf Some object-file formats provide extra features to the \c{EXTERN} directive. In all cases, the extra features are used by suffixing a @@ -3152,7 +3279,7 @@ For example, the \c{obj} format allows you to declare that the default segment base of an external should be the group \c{dgroup} by means of the directive -\c extern _variable:wrt dgroup +\c extern _variable:wrt dgroup The primitive form of \c{EXTERN} differs from the user-level form only in that it can take only one argument at a time: the support @@ -3178,15 +3305,16 @@ the definition of the symbol. refer to symbols which \e{are} defined in the same module as the \c{GLOBAL} directive. For example: -\c global _main -\c _main: ; some code +\c global _main +\c _main: +\c ; some code \c{GLOBAL}, like \c{EXTERN}, allows object formats to define private extensions by means of a colon. The \c{elf} object format, for example, lets you specify whether global data items are functions or data: -\c global hashlookup:function, hashtable:data +\c global hashlookup:function, hashtable:data Like \c{EXTERN}, the primitive form of \c{GLOBAL} differs from the user-level form only in that it can take only one argument at a @@ -3199,13 +3327,14 @@ The \c{COMMON} directive is used to declare \i\e{common variables}. A common variable is much like a global variable declared in the uninitialised data section, so that -\c common intvar 4 +\c common intvar 4 is similar in function to -\c global intvar -\c section .bss -\c intvar resd 1 +\c global intvar +\c section .bss +\c +\c intvar resd 1 The difference is that if more than one module defines the same common variable, then at link time those variables will be @@ -3217,8 +3346,8 @@ specific extensions. For example, the \c{obj} format allows common variables to be NEAR or FAR, and the \c{elf} format allows you to specify the alignment requirements of a common variable: -\c common commvar 4:near ; works in OBJ -\c common intarray 100:4 ; works in ELF: 4 byte aligned +\c common commvar 4:near ; works in OBJ +\c common intarray 100:4 ; works in ELF: 4 byte aligned Once again, like \c{EXTERN} and \c{GLOBAL}, the primitive form of \c{COMMON} differs from the user-level form only in that it can take @@ -3246,11 +3375,11 @@ Options are: \b\c{CPU PENTIUM} Same as 586 -\b\c{CPU 686} Pentium Pro instruction set +\b\c{CPU 686} P6 instruction set \b\c{CPU PPRO} Same as 686 -\b\c{CPU P2} Pentium II instruction set +\b\c{CPU P2} Same as 686 \b\c{CPU P3} Pentium III and Katmai instruction sets @@ -3322,8 +3451,8 @@ the program begins at when it is loaded into memory. For example, the following code will generate the longword \c{0x00000104}: -\c org 0x100 -\c dd label +\c org 0x100 +\c dd label \c label: Unlike the \c{ORG} directive provided by MASM-compatible assemblers, @@ -3343,7 +3472,7 @@ directive to allow you to specify the alignment requirements of segments. This is done by appending the \i\c{ALIGN} qualifier to the end of the section-definition line. For example, -\c section .data align=16 +\c section .data align=16 switches to the section \c{.data} and also specifies that it must be aligned on a 16-byte boundary. @@ -3381,25 +3510,30 @@ When you define a segment in an \c{obj} file, NASM defines the segment name as a symbol as well, so that you can access the segment address of the segment. So, for example: -\c segment data -\c dvar: dw 1234 -\c segment code -\c function: mov ax,data ; get segment address of data -\c mov ds,ax ; and move it into DS -\c inc word [dvar] ; now this reference will work -\c ret +\c segment data +\c +\c dvar: dw 1234 +\c +\c segment code +\c +\c function: +\c mov ax,data ; get segment address of data +\c mov ds,ax ; and move it into DS +\c inc word [dvar] ; now this reference will work +\c ret The \c{obj} format also enables the use of the \i\c{SEG} and \i\c{WRT} operators, so that you can write code which does things like -\c extern foo -\c mov ax,seg foo ; get preferred segment of foo -\c mov ds,ax -\c mov ax,data ; a different segment -\c mov es,ax -\c mov ax,[ds:foo] ; this accesses `foo' -\c mov [es:foo wrt data],bx ; so does this +\c extern foo +\c +\c mov ax,seg foo ; get preferred segment of foo +\c mov ds,ax +\c mov ax,data ; a different segment +\c mov es,ax +\c mov ax,[ds:foo] ; this accesses `foo' +\c mov [es:foo wrt data],bx ; so does this \S{objseg} \c{obj} Extensions to the \c{SEGMENT} @@ -3410,7 +3544,7 @@ directive to allow you to specify various properties of the segment you are defining. This is done by appending extra qualifiers to the end of the segment-definition line. For example, -\c segment code private align=16 +\c segment code private align=16 defines the segment \c{code}, but also declares it to be a private segment, and requires that the portion of it described in this code @@ -3472,11 +3606,15 @@ single segment register can be used to refer to all the segments in a group. NASM therefore supplies the \c{GROUP} directive, whereby you can code -\c segment data -\c ; some data -\c segment bss -\c ; some uninitialised data -\c group dgroup data bss +\c segment data +\c +\c ; some data +\c +\c segment bss +\c +\c ; some uninitialised data +\c +\c group dgroup data bss which will define a group called \c{dgroup} to contain the segments \c{data} and \c{bss}. Like \c{SEGMENT}, \c{GROUP} causes the group @@ -3528,14 +3666,14 @@ white space, which are (respectively) the name of the symbol you wish to import and the name of the library you wish to import it from. For example: -\c import WSAStartup wsock32.dll +\c import WSAStartup wsock32.dll A third optional parameter gives the name by which the symbol is known in the library you are importing it from, in case this is not the same as the name you wish the symbol to be known by to your code once you have imported it. For example: -\c import asyncsel wsock32.dll WSAAsyncSelect +\c import asyncsel wsock32.dll WSAAsyncSelect \S{export} \i\c{EXPORT}: Exporting DLL Symbols\I{DLL symbols, @@ -3577,10 +3715,10 @@ the desired number. For example: -\c export myfunc -\c export myfunc TheRealMoreFormalLookingFunctionName -\c export myfunc myfunc 1234 ; export by ordinal -\c export myfunc myfunc resident parm=23 nodata +\c export myfunc +\c export myfunc TheRealMoreFormalLookingFunctionName +\c export myfunc myfunc 1234 ; export by ordinal +\c export myfunc myfunc resident parm=23 nodata \S{dotdotstart} \i\c{..start}: Defining the \i{Program Entry @@ -3599,29 +3737,29 @@ Directive\I{EXTERN, obj extensions to} If you declare an external symbol with the directive -\c extern foo +\c extern foo then references such as \c{mov ax,foo} will give you the offset of \c{foo} from its preferred segment base (as specified in whichever module \c{foo} is actually defined in). So to access the contents of \c{foo} you will usually need to do something like -\c mov ax,seg foo ; get preferred segment base -\c mov es,ax ; move it into ES -\c mov ax,[es:foo] ; and use offset `foo' from it +\c mov ax,seg foo ; get preferred segment base +\c mov es,ax ; move it into ES +\c mov ax,[es:foo] ; and use offset `foo' from it This is a little unwieldy, particularly if you know that an external is going to be accessible from a given segment or group, say \c{dgroup}. So if \c{DS} already contained \c{dgroup}, you could simply code -\c mov ax,[foo wrt dgroup] +\c mov ax,[foo wrt dgroup] However, having to type this every time you want to access \c{foo} can be a pain; so NASM allows you to declare \c{foo} in the alternative form -\c extern foo:wrt dgroup +\c extern foo:wrt dgroup This form causes NASM to pretend that the preferred segment base of \c{foo} is in fact \c{dgroup}; so the expression \c{seg foo} will @@ -3641,8 +3779,8 @@ The \c{obj} format allows common variables to be either near\I{near common variables} or far\I{far common variables}; NASM allows you to specify which your variables should be by the use of the syntax -\c common nearvar 2:near ; `nearvar' is a near common -\c common farvar 10:far ; and `farvar' is far +\c common nearvar 2:near ; `nearvar' is a near common +\c common farvar 10:far ; and `farvar' is far Far common variables may be greater in size than 64Kb, and so the OMF specification says that they are declared as a number of @@ -3657,24 +3795,24 @@ in more than one module. Therefore NASM must allow you to specify the element size on your far common variables. This is done by the following syntax: -\c common c_5by2 10:far 5 ; two five-byte elements -\c common c_2by5 10:far 2 ; five two-byte elements +\c common c_5by2 10:far 5 ; two five-byte elements +\c common c_2by5 10:far 2 ; five two-byte elements If no element size is specified, the default is 1. Also, the \c{FAR} keyword is not required when an element size is specified, since only far commons may have element sizes at all. So the above declarations could equivalently be -\c common c_5by2 10:5 ; two five-byte elements -\c common c_2by5 10:2 ; five two-byte elements +\c common c_5by2 10:5 ; two five-byte elements +\c common c_2by5 10:2 ; five two-byte elements In addition to these extensions, the \c{COMMON} directive in \c{obj} also supports default-\c{WRT} specification like \c{EXTERN} does (explained in \k{objextern}). So you can also declare things like -\c common foo 10:wrt dgroup -\c common bar 16:far 2:wrt data -\c common baz 24:wrt data:6 +\c common foo 10:wrt dgroup +\c common bar 16:far 2:wrt data +\c common baz 24:wrt data:6 \H{win32fmt} \i\c{win32}: Microsoft Win32 Object Files @@ -3743,10 +3881,10 @@ alignment), though the value does not matter. The defaults assumed by NASM if you do not specify the above qualifiers are: -\c section .text code align=16 -\c section .data data align=4 -\c section .rdata rdata align=8 -\c section .bss bss align=4 +\c section .text code align=16 +\c section .data data align=4 +\c section .rdata rdata align=8 +\c section .bss bss align=4 Any other section name is treated by default like \c{.text}. @@ -3807,10 +3945,10 @@ requirements of the section. The defaults assumed by NASM if you do not specify the above qualifiers are: -\c section .text progbits alloc exec nowrite align=16 -\c section .data progbits alloc noexec write align=4 -\c section .bss nobits alloc noexec write align=4 -\c section other progbits alloc noexec nowrite align=1 +\c section .text progbits alloc exec nowrite align=16 +\c section .data progbits alloc noexec write align=4 +\c section .bss nobits alloc noexec write align=4 +\c section other progbits alloc noexec nowrite align=1 (Any section name other than \c{.text}, \c{.data} and \c{.bss} is treated by default like \c{other} in the above code.) @@ -3891,7 +4029,7 @@ object by suffixing the name with a colon and the word \i\c{function} or \i\c{data}. (\i\c{object} is a synonym for \c{data}.) For example: -\c global hashlookup:function, hashtable:data +\c global hashlookup:function, hashtable:data exports the global symbol \c{hashlookup} as a function and \c{hashtable} as a data object. @@ -3900,9 +4038,10 @@ You can also specify the size of the data associated with the symbol, as a numeric expression (which may involve labels, and even forward references) after the type specifier. Like this: -\c global hashtable:data (hashtable.end - hashtable) +\c global hashtable:data (hashtable.end - hashtable) +\c \c hashtable: -\c db this,that,theother ; some data here +\c db this,that,theother ; some data here \c .end: This makes NASM automatically calculate the length of the table and @@ -3913,8 +4052,8 @@ writing shared library code. For more information, see \k{picglobal}. -\S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive\I{COMMON, -elf extensions to} +\S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive +\I{COMMON, elf extensions to} \c{ELF} also allows you to specify alignment requirements \I{common variables, alignment in elf}\I{alignment, of elf common variables}on @@ -3923,7 +4062,7 @@ power of two) after the name and size of the common variable, separated (as usual) by a colon. For example, an array of doublewords would benefit from 4-byte alignment: -\c common dwordarray 128:4 +\c common dwordarray 128:4 This declares the total size of the array to be 128 bytes, and requires that it be aligned on a 4-byte boundary. @@ -4018,7 +4157,7 @@ library to be linked to the module, either at load time or run time. This is done by the \c{LIBRARY} directive, which takes one argument which is the name of the module: -\c library mylib.rdl +\c library mylib.rdl \S{rdfmod} Specifying a Module Name: The \i\c{MODULE} Directive @@ -4028,13 +4167,13 @@ It can be used, for example, by run-time loader to perform dynamic linking. \c{MODULE} directive takes one argument which is the name of current module: -\c module mymodname +\c module mymodname Note that when you statically link modules and tell linker to strip the symbols from output file, all module names will be stripped too. To avoid it, you should start module names with \I{$prefix}\c{$}, like: -\c module $kernel.core +\c module $kernel.core \S{rdfglob} \c{rdf} Extensions to the \c{GLOBAL} directive\I{GLOBAL, @@ -4049,17 +4188,17 @@ is a procedure (function) or data object. Suffixing the name with a colon and the word \i\c{export} you make the symbol exported: -\c global sys_open:export +\c global sys_open:export To specify that exported symbol is a procedure (function), you add the word \i\c{proc} or \i\c{function} after declaration: -\c global sys_open:export proc +\c global sys_open:export proc Similarly, to specify exported data object, add the word \i\c{data} or \i\c{object} to the directive: -\c global kernel_ticks:export data +\c global kernel_ticks:export data \H{dbgfmt} \i\c{dbg}: Debugging Format @@ -4173,13 +4312,14 @@ the segment registers, and declaring a start point. This file is also provided in the \I{test subdirectory}\c{test} subdirectory of the NASM archives, under the name \c{objexe.asm}. -\c segment code +\c segment code \c -\c ..start: mov ax,data -\c mov ds,ax -\c mov ax,stack -\c mov ss,ax -\c mov sp,stacktop +\c ..start: +\c mov ax,data +\c mov ds,ax +\c mov ax,stack +\c mov ss,ax +\c mov sp,stacktop This initial piece of code sets up \c{DS} to point to the data segment, and initialises \c{SS} and \c{SP} to point to the top of @@ -4193,27 +4333,28 @@ Note also that the special symbol \c{..start} is defined at the beginning of this code, which means that will be the entry point into the resulting executable file. -\c mov dx,hello -\c mov ah,9 -\c int 0x21 +\c mov dx,hello +\c mov ah,9 +\c int 0x21 The above is the main program: load \c{DS:DX} with a pointer to the greeting message (\c{hello} is implicitly relative to the segment \c{data}, which was loaded into \c{DS} in the setup code, so the full pointer is valid), and call the DOS print-string function. -\c mov ax,0x4c00 -\c int 0x21 +\c mov ax,0x4c00 +\c int 0x21 This terminates the program using another DOS system call. -\c segment data -\c hello: db 'hello, world', 13, 10, '$' +\c segment data +\c +\c hello: db 'hello, world', 13, 10, '$' The data segment contains the string we want to display. -\c segment stack stack -\c resb 64 +\c segment stack stack +\c resb 64 \c stacktop: The above code declares a stack segment containing 64 bytes of @@ -4292,13 +4433,20 @@ segment (though the segment may change). Execution then begins at write a \c{.COM} program, you would create a source file looking like -\c org 100h -\c section .text -\c start: ; put your code here -\c section .data -\c ; put data items here -\c section .bss -\c ; put uninitialised data here +\c org 100h +\c +\c section .text +\c +\c start: +\c ; put your code here +\c +\c section .data +\c +\c ; put data items here +\c +\c section .bss +\c +\c ; put uninitialised data here The \c{bin} format puts the \c{.text} section first in the file, so you can declare data or BSS items before beginning to write code if @@ -4394,14 +4542,18 @@ not have to worry about name clashes with C symbols. If you find the underscores inconvenient, you can define macros to replace the \c{GLOBAL} and \c{EXTERN} directives as follows: -\c %macro cglobal 1 -\c global _%1 -\c %define %1 _%1 +\c %macro cglobal 1 +\c +\c global _%1 +\c %define %1 _%1 +\c \c %endmacro - -\c %macro cextern 1 -\c extern _%1 -\c %define %1 _%1 +\c +\c %macro cextern 1 +\c +\c extern _%1 +\c %define %1 _%1 +\c \c %endmacro (These forms of the macros only take one argument at a time; a @@ -4409,11 +4561,11 @@ replace the \c{GLOBAL} and \c{EXTERN} directives as follows: If you then declare an external like this: -\c cextern printf +\c cextern printf then the macro will expand it as -\c extern _printf +\c extern _printf \c %define printf _printf Thereafter, you can reference \c{printf} as if it was a symbol, and @@ -4562,15 +4714,19 @@ sequence points without performance suffering. Thus, you would define a function in C style in the following way. The following example is for small model: -\c global _myfunc -\c _myfunc: push bp -\c mov bp,sp -\c sub sp,0x40 ; 64 bytes of local stack space -\c mov bx,[bp+4] ; first parameter to function -\c ; some more code -\c mov sp,bp ; undo "sub sp,0x40" above -\c pop bp -\c ret +\c global _myfunc +\c +\c _myfunc: +\c push bp +\c mov bp,sp +\c sub sp,0x40 ; 64 bytes of local stack space +\c mov bx,[bp+4] ; first parameter to function +\c +\c ; some more code +\c +\c mov sp,bp ; undo "sub sp,0x40" above +\c pop bp +\c ret For a large-model function, you would replace \c{RET} by \c{RETF}, and look for the first parameter at \c{[BP+6]} instead of @@ -4582,16 +4738,21 @@ stack when passed as a parameter, whereas near pointers take up two. At the other end of the process, to call a C function from your assembly code, you would do something like this: -\c extern _printf -\c ; and then, further down... -\c push word [myint] ; one of my integer variables -\c push word mystring ; pointer into my data segment -\c call _printf -\c add sp,byte 4 ; `byte' saves space -\c ; then those data items... -\c segment _DATA -\c myint dw 1234 -\c mystring db 'This number -> %d <- should be 1234',10,0 +\c extern _printf +\c +\c ; and then, further down... +\c +\c push word [myint] ; one of my integer variables +\c push word mystring ; pointer into my data segment +\c call _printf +\c add sp,byte 4 ; `byte' saves space +\c +\c ; then those data items... +\c +\c segment _DATA +\c +\c myint dw 1234 +\c mystring db 'This number -> %d <- should be 1234',10,0 This piece of code is the small-model assembly equivalent of the C code @@ -4604,11 +4765,11 @@ this example, it is assumed that \c{DS} already holds the segment base of the segment \c{_DATA}. If not, you would have to initialise it first. -\c push word [myint] -\c push word seg mystring ; Now push the segment, and... -\c push word mystring ; ... offset of "mystring" -\c call far _printf -\c add sp,byte 6 +\c push word [myint] +\c push word seg mystring ; Now push the segment, and... +\c push word mystring ; ... offset of "mystring" +\c call far _printf +\c add sp,byte 6 The integer value still takes up one word on the stack, since large model does not affect the size of the \c{int} data type. The first @@ -4631,15 +4792,17 @@ C can access, you need only declare the names as \c{GLOBAL} or in \k{16cunder}.) Thus, a C variable declared as \c{int i} can be accessed from assembler as -\c extern _i -\c mov ax,[_i] +\c extern _i +\c +\c mov ax,[_i] And to declare your own integer variable which C programs can access as \c{extern int j}, you do this (making sure you are assembling in the \c{_DATA} segment, if necessary): -\c global _j -\c _j dw 0 +\c global _j +\c +\c _j dw 0 To access a C array, you need to know the size of the components of the array. For example, \c{int} variables are two bytes long, so if @@ -4688,13 +4851,15 @@ into NASM's preprocessor. See \k{tasmcompat} for details.) An example of an assembly function using the macro set is given here: -\c proc _nearproc -\c %$i arg -\c %$j arg -\c mov ax,[bp + %$i] -\c mov bx,[bp + %$j] -\c add ax,[bx] -\c endproc +\c proc _nearproc +\c +\c %$i arg +\c %$j arg +\c mov ax,[bp + %$i] +\c mov bx,[bp + %$j] +\c add ax,[bx] +\c +\c endproc This defines \c{_nearproc} to be a procedure taking two arguments, the first (\c{i}) an integer and the second (\c{j}) a pointer to an @@ -4723,14 +4888,17 @@ many function parameters will be of type \c{int}. The large-model equivalent of the above function would look like this: \c %define FARCODE -\c proc _farproc -\c %$i arg -\c %$j arg 4 -\c mov ax,[bp + %$i] -\c mov bx,[bp + %$j] -\c mov es,[bp + %$j + 2] -\c add ax,[bx] -\c endproc +\c +\c proc _farproc +\c +\c %$i arg +\c %$j arg 4 +\c mov ax,[bp + %$i] +\c mov bx,[bp + %$j] +\c mov es,[bp + %$j + 2] +\c add ax,[bx] +\c +\c endproc This makes use of the argument to the \c{arg} macro to define a parameter of size 4, because \c{j} is now a far pointer. When we @@ -4829,26 +4997,31 @@ do nothing further. Thus, you would define a function in Pascal style, taking two \c{Integer}-type parameters, in the following way: -\c global myfunc -\c myfunc: push bp -\c mov bp,sp -\c sub sp,0x40 ; 64 bytes of local stack space -\c mov bx,[bp+8] ; first parameter to function -\c mov bx,[bp+6] ; second parameter to function -\c ; some more code -\c mov sp,bp ; undo "sub sp,0x40" above -\c pop bp -\c retf 4 ; total size of params is 4 +\c global myfunc +\c +\c myfunc: push bp +\c mov bp,sp +\c sub sp,0x40 ; 64 bytes of local stack space +\c mov bx,[bp+8] ; first parameter to function +\c mov bx,[bp+6] ; second parameter to function +\c +\c ; some more code +\c +\c mov sp,bp ; undo "sub sp,0x40" above +\c pop bp +\c retf 4 ; total size of params is 4 At the other end of the process, to call a Pascal function from your assembly code, you would do something like this: -\c extern SomeFunc -\c ; and then, further down... -\c push word seg mystring ; Now push the segment, and... -\c push word mystring ; ... offset of "mystring" -\c push word [myint] ; one of my variables -\c call far SomeFunc +\c extern SomeFunc +\c +\c ; and then, further down... +\c +\c push word seg mystring ; Now push the segment, and... +\c push word mystring ; ... offset of "mystring" +\c push word [myint] ; one of my variables +\c call far SomeFunc This is equivalent to the Pascal code @@ -4893,14 +5066,17 @@ argument offsets; you must declare your function's arguments in reverse order. For example: \c %define PASCAL -\c proc _pascalproc -\c %$j arg 4 -\c %$i arg -\c mov ax,[bp + %$i] -\c mov bx,[bp + %$j] -\c mov es,[bp + %$j + 2] -\c add ax,[bx] -\c endproc +\c +\c proc _pascalproc +\c +\c %$j arg 4 +\c %$i arg +\c mov ax,[bp + %$i] +\c mov bx,[bp + %$j] +\c mov es,[bp + %$j + 2] +\c add ax,[bx] +\c +\c endproc This defines the same routine, conceptually, as the example in \k{16cmacro}: it defines a function taking two arguments, an integer @@ -5026,28 +5202,37 @@ still pushed in right-to-left order. Thus, you would define a function in C style in the following way: -\c global _myfunc -\c _myfunc: push ebp -\c mov ebp,esp -\c sub esp,0x40 ; 64 bytes of local stack space -\c mov ebx,[ebp+8] ; first parameter to function -\c ; some more code -\c leave ; mov esp,ebp / pop ebp -\c ret +\c global _myfunc +\c +\c _myfunc: +\c push ebp +\c mov ebp,esp +\c sub esp,0x40 ; 64 bytes of local stack space +\c mov ebx,[ebp+8] ; first parameter to function +\c +\c ; some more code +\c +\c leave ; mov esp,ebp / pop ebp +\c ret At the other end of the process, to call a C function from your assembly code, you would do something like this: -\c extern _printf -\c ; and then, further down... -\c push dword [myint] ; one of my integer variables -\c push dword mystring ; pointer into my data segment -\c call _printf -\c add esp,byte 8 ; `byte' saves space -\c ; then those data items... -\c segment _DATA -\c myint dd 1234 -\c mystring db 'This number -> %d <- should be 1234',10,0 +\c extern _printf +\c +\c ; and then, further down... +\c +\c push dword [myint] ; one of my integer variables +\c push dword mystring ; pointer into my data segment +\c call _printf +\c add esp,byte 8 ; `byte' saves space +\c +\c ; then those data items... +\c +\c segment _DATA +\c +\c myint dd 1234 +\c mystring db 'This number -> %d <- should be 1234',10,0 This piece of code is the assembly equivalent of the C code @@ -5118,13 +5303,15 @@ the work involved in keeping track of the calling convention. An example of an assembly function using the macro set is given here: -\c proc _proc32 -\c %$i arg -\c %$j arg -\c mov eax,[ebp + %$i] -\c mov ebx,[ebp + %$j] -\c add eax,[ebx] -\c endproc +\c proc _proc32 +\c +\c %$i arg +\c %$j arg +\c mov eax,[ebp + %$i] +\c mov ebx,[ebp + %$j] +\c add eax,[ebx] +\c +\c endproc This defines \c{_proc32} to be a procedure taking two arguments, the first (\c{i}) an integer and the second (\c{j}) a pointer to an @@ -5165,7 +5352,7 @@ must therefore not depend on where it is loaded in memory. Therefore, you cannot get at your variables by writing code like this: -\c mov eax,[myvar] ; WRONG +\c mov eax,[myvar] ; WRONG Instead, the linker provides an area of memory called the \i\e{global offset table}, or \i{GOT}; the GOT is situated at a @@ -5188,25 +5375,28 @@ too much worry (but see \k{picglobal} for a caveat). Each code module in your shared library should define the GOT as an external symbol: -\c extern _GLOBAL_OFFSET_TABLE_ ; in ELF -\c extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out +\c extern _GLOBAL_OFFSET_TABLE_ ; in ELF +\c extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out At the beginning of any function in your shared library which plans to access your data or BSS sections, you must first calculate the address of the GOT. This is typically done by writing the function in this form: -\c func: push ebp -\c mov ebp,esp -\c push ebx -\c call .get_GOT -\c .get_GOT: pop ebx -\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc -\c ; the function body comes here -\c mov ebx,[ebp-4] -\c mov esp,ebp -\c pop ebp -\c ret +\c func: push ebp +\c mov ebp,esp +\c push ebx +\c call .get_GOT +\c .get_GOT: +\c pop ebx +\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc +\c +\c ; the function body comes here +\c +\c mov ebx,[ebp-4] +\c mov esp,ebp +\c pop ebp +\c ret (For BSD, again, the symbol \c{_GLOBAL_OFFSET_TABLE} requires a second leading underscore.) @@ -5238,10 +5428,13 @@ If you didn't follow that, don't worry: it's never necessary to obtain the address of the GOT by any other means, so you can put those three instructions into a macro and safely ignore them: -\c %macro get_GOT 0 -\c call %%getgot -\c %%getgot: pop ebx -\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc +\c %macro get_GOT 0 +\c +\c call %%getgot +\c %%getgot: +\c pop ebx +\c add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc +\c \c %endmacro \S{piclocal} Finding Your Local Data Items @@ -5252,7 +5445,7 @@ declared; they can be accessed using the \I{GOTOFF relocation}\c{..gotoff} special \I\c{WRT ..gotoff}\c{WRT} type. The way this works is like this: -\c lea eax,[ebx+myvar wrt ..gotoff] +\c lea eax,[ebx+myvar wrt ..gotoff] The expression \c{myvar wrt ..gotoff} is calculated, when the shared library is linked, to be the offset to the local variable \c{myvar} @@ -5284,7 +5477,7 @@ dynamic linker will place the correct address in it at load time. So to obtain the address of an external variable \c{extvar} in \c{EAX}, you would code -\c mov eax,[ebx+extvar wrt ..got] +\c mov eax,[ebx+extvar wrt ..got] This loads the address of \c{extvar} out of an entry in the GOT. The linker, when it builds the shared library, collects together every @@ -5306,14 +5499,17 @@ declared. So to export a function to users of the library, you must use -\c global func:function ; declare it as a function -\c func: push ebp -\c ; etc. +\c global func:function ; declare it as a function +\c +\c func: push ebp +\c +\c ; etc. And to export a data item such as an array, you would have to code -\c global array:data array.end-array ; give the size too -\c array: resd 128 +\c global array:data array.end-array ; give the size too +\c +\c array: resd 128 \c .end: Be careful: If you export a variable to the library user, by @@ -5328,7 +5524,7 @@ Equally, if you need to store the address of an exported global in one of your data sections, you can't do it by means of the standard sort of code: -\c dataptr: dd global_data_item ; WRONG +\c dataptr: dd global_data_item ; WRONG NASM will interpret this code as an ordinary relocation, in which \c{global_data_item} is merely an offset from the beginning of the @@ -5338,7 +5534,7 @@ which resides elsewhere. Instead of the above code, then, you must write -\c dataptr: dd global_data_item wrt ..sym +\c dataptr: dd global_data_item wrt ..sym which makes use of the special \c{WRT} type \I\c{WRT ..sym}\c{..sym} to instruct NASM to search the symbol table for a particular symbol @@ -5347,11 +5543,11 @@ at that address, rather than just relocating by section base. Either method will work for functions: referring to one of your functions by means of -\c funcptr: dd my_function +\c funcptr: dd my_function will give the user the address of the code you wrote, whereas -\c funcptr: dd my_function wrt ..sym +\c funcptr: dd my_function wrt ..sym will give the address of the procedure linkage table for the function, which is where the calling program will \e{believe} the @@ -5420,7 +5616,7 @@ This jump must specify a 48-bit far address, since the target segment is a 32-bit one. However, it must be assembled in a 16-bit segment, so just coding, for example, -\c jmp 0x1234:0x56789ABC ; wrong! +\c jmp 0x1234:0x56789ABC ; wrong! will not work, since the offset part of the address will be truncated to \c{0x9ABC} and the jump will be an ordinary 16-bit far @@ -5431,7 +5627,7 @@ generate the required instruction by coding it manually, using \c{DB} instructions. NASM can go one better than that, by actually generating the right instruction itself. Here's how to do it right: -\c jmp dword 0x1234:0x56789ABC ; right +\c jmp dword 0x1234:0x56789ABC ; right \I\c{JMP DWORD}The \c{DWORD} prefix (strictly speaking, it should come \e{after} the colon, since it is declaring the \e{offset} field @@ -5443,7 +5639,7 @@ segment to a 32-bit one. You can do the reverse operation, jumping from a 32-bit segment to a 16-bit one, by means of the \c{WORD} prefix: -\c jmp word 0x8765:0x4321 ; 32 to 16 bit +\c jmp word 0x8765:0x4321 ; 32 to 16 bit If the \c{WORD} prefix is specified in 16-bit mode, or the \c{DWORD} prefix in 32-bit mode, they will be ignored, since each is @@ -5468,8 +5664,8 @@ The easiest way to do this is to make sure you use a register for the address, since any effective address containing a 32-bit register is forced to be a 32-bit address. So you can do -\c mov eax,offset_into_32_bit_segment_specified_by_fs -\c mov dword [fs:eax],0x11223344 +\c mov eax,offset_into_32_bit_segment_specified_by_fs +\c mov dword [fs:eax],0x11223344 This is fine, but slightly cumbersome (since it wastes an instruction and a register) if you already know the precise offset @@ -5480,20 +5676,20 @@ NASM be able to generate the best instruction for the purpose? It can. As in \k{mixjump}, you need only prefix the address with the \c{DWORD} keyword, and it will be forced to be a 32-bit address: -\c mov dword [fs:dword my_offset],0x11223344 +\c mov dword [fs:dword my_offset],0x11223344 Also as in \k{mixjump}, NASM is not fussy about whether the \c{DWORD} prefix comes before or after the segment override, so arguably a nicer-looking way to code the above instruction is -\c mov dword [dword fs:my_offset],0x11223344 +\c mov dword [dword fs:my_offset],0x11223344 Don't confuse the \c{DWORD} prefix \e{outside} the square brackets, which controls the size of the data stored at the address, with the one \c{inside} the square brackets which controls the length of the address itself. The two can quite easily be different: -\c mov word [dword 0x12345678],0x9ABC +\c mov word [dword 0x12345678],0x9ABC This moves 16 bits of data to an address specified by a 32-bit offset. @@ -5501,7 +5697,7 @@ offset. You can also specify \c{WORD} or \c{DWORD} prefixes along with the \c{FAR} prefix to indirect far jumps or calls. For example: -\c call dword far [fs:word 0x4321] +\c call dword far [fs:word 0x4321] This instruction contains an address specified by a 16-bit offset; it loads a 48-bit far pointer from that (16-bit segment and 32-bit @@ -5521,7 +5717,7 @@ you are coding \c{LODSB} in a 16-bit segment but it is supposed to be accessing a string in a 32-bit segment, you should load the desired address into \c{ESI} and then code -\c a32 lodsb +\c a32 lodsb The prefix forces the addressing size to 32 bits, meaning that \c{LODSB} loads from \c{[DS:ESI]} instead of \c{[DS:SI]}. To access @@ -5548,8 +5744,8 @@ give the value of the segment register being manipulated. To force the 16-bit behaviour of segment-register push and pop instructions, you can use the operand-size prefix \i\c{o16}: -\c o16 push ss -\c o16 push ds +\c o16 push ss +\c o16 push ds This code saves a doubleword of stack space by fitting two segment registers into the space which would normally be consumed by pushing @@ -5609,19 +5805,23 @@ complain that \c{ORG} doesn't work the way they'd like: in order to place the \c{0xAA55} signature word at the end of a 512-byte boot sector, people who are used to MASM tend to code -\c ORG 0 -\c ; some boot sector code -\c ORG 510 -\c DW 0xAA55 +\c ORG 0 +\c +\c ; some boot sector code +\c +\c ORG 510 +\c DW 0xAA55 This is not the intended use of the \c{ORG} directive in NASM, and will not work. The correct way to solve this problem in NASM is to use the \i\c{TIMES} directive, like this: -\c ORG 0 -\c ; some boot sector code -\c TIMES 510-($-$$) DB 0 -\c DW 0xAA55 +\c ORG 0 +\c +\c ; some boot sector code +\c +\c TIMES 510-($-$$) DB 0 +\c DW 0xAA55 The \c{TIMES} directive will insert exactly enough zero bytes into the output to move the assembly point up to 510. This method also @@ -5636,7 +5836,7 @@ find out what's wrong with it. The other common problem with the above code is people who write the \c{TIMES} line as -\c TIMES 510-$ DB 0 +\c TIMES 510-$ DB 0 by reasoning that \c{$} should be a pure number, just like 510, so the difference between them is also a pure number and can happily be @@ -5656,7 +5856,7 @@ involving section bases cannot be passed as arguments to \c{TIMES}. The solution, as in the previous section, is to code the \c{TIMES} line in the form -\c TIMES 510-($-$$) DB 0 +\c TIMES 510-($-$$) DB 0 in which \c{$} and \c{$$} are offsets from the same section base, and so their difference is a pure number. This will solve the @@ -6187,46 +6387,46 @@ to the "cc" in an integer instruction that used a condition code). The instructions that use this will give details of what the various mnemonics are, this table is used to help you work out details of what is happening. - -Predi- imm8 Description Relation where: Emula- Result if QNaN - cate Encod- A Is 1st Operand tion NaN Signals - ing B Is 2nd Operand Operand Invalid - -EQ 000B equal A = B False No - -LT 001B less-than A < B False Yes - -LE 010B less-than- A <= B False Yes - or-equal - ---- ---- greater A > B Swap False Yes - than Operands, - Use LT - ---- ---- greater- A >= B Swap False Yes - than-or-equal Operands, - Use LE - -UNORD 011B unordered A, B = Unordered True No - -NEQ 100B not-equal A != B True No - -NLT 101B not-less- NOT(A < B) True Yes - than - -NLE 110B not-less- NOT(A <= B) True Yes - than-or- - equal - ---- ---- not-greater NOT(A > B) Swap True Yes - than Operands, - Use NLT ---- ---- not-greater NOT(A >= B) Swap True Yes - than- Operands, - or-equal Use NLE - -ORD 111B ordered A , B = Ordered False No +\c Predi- imm8 Description Relation where: Emula- Result QNaN +\c cate Encod- A Is 1st Operand tion if NaN Signal +\c ing B Is 2nd Operand Operand Invalid +\c +\c EQ 000B equal A = B False No +\c +\c LT 001B less-than A < B False Yes +\c +\c LE 010B less-than- A <= B False Yes +\c or-equal +\c +\c --- ---- greater A > B Swap False Yes +\c than Operands, +\c Use LT +\c +\c --- ---- greater- A >= B Swap False Yes +\c than-or-equal Operands, +\c Use LE +\c +\c UNORD 011B unordered A, B = Unordered True No +\c +\c NEQ 100B not-equal A != B True No +\c +\c NLT 101B not-less- NOT(A < B) True Yes +\c than +\c +\c NLE 110B not-less- NOT(A <= B) True Yes +\c than-or- +\c equal +\c +\c --- ---- not-greater NOT(A > B) Swap True Yes +\c than Operands, +\c Use NLT +\c +\c --- ---- not-greater NOT(A >= B) Swap True Yes +\c than- Operands, +\c or-equal Use NLE +\c +\c ORD 111B ordered A , B = Ordered False No The unordered relationship is true when at least one of the two values being compared is a NaN or in an unsupported format. -- 2.7.4