\C{64bit} Writing 64-bit Code (Unix, Win64) This chapter attempts to cover some of the common issues involved when writing 64-bit code, to run under \i{Win64} or Unix. It covers how to write assembly code to interface with 64-bit C routines, and how to write position-independent code for shared libraries. All 64-bit code uses a flat memory model, since segmentation is not available in 64-bit mode. The one exception is the \c{FS} and \c{GS} registers, which still add their bases. Position independence in 64-bit mode is significantly simpler, since the processor supports \c{RIP}-relative addressing directly; see the \c{REL} keyword (\k{effaddr}). On most 64-bit platforms, it is probably desirable to make that the default, using the directive \c{DEFAULT REL} (\k{default}). \c{DEFAULT REL} is likely to become the default in a future version of NASM. 64-bit programming is relatively similar to 32-bit programming, but of course pointers are 64 bits long; additionally, all existing platforms pass arguments in registers rather than on the stack. Furthermore, 64-bit platforms use SSE2 by default for floating point. Please see the ABI documentation for your platform. 64-bit platforms differ in the sizes of the C/C++ fundamental datatypes, not just from 32-bit platforms but from each other. If a specific size data type is desired, it is probably best to use the types defined in the standard C header \c{}. All known 64-bit platforms except some embedded platforms require that the stack is 16-byte aligned at the entry to a function. In order to enforce that, the stack pointer (\c{RSP}) needs to be aligned on an \c{odd} multiple of 8 bytes before the \c{CALL} instruction. In 64-bit mode, the default instruction size is still 32 bits. When loading a value into a 32-bit register (but not an 8- or 16-bit register), the upper 32 bits of the corresponding 64-bit register are set to zero. \H{reg64} Register Names in 64-bit Mode NASM uses the following names for general-purpose registers in 64-bit mode, for 8-, 16-, 32- and 64-bit references, respectively: \c AL/AH, CL/CH, DL/DH, BL/BH, SPL, BPL, SIL, DIL, R8B-R15B \c AX, CX, DX, BX, SP, BP, SI, DI, R8W-R15W \c EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, R8D-R15D \c RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8-R15 This is consistent with the AMD documentation and most other assemblers. The Intel documentation, however, uses the names \c{R8L-R15L} for 8-bit references to the higher registers. It is possible to use those names by definiting them as macros; similarly, if one wants to use numeric names for the low 8 registers, define them as macros. The standard macro package \c{altreg} (see \k{pkg_altreg}) can be used for this purpose. \H{id64} Immediates and Displacements in 64-bit Mode In 64-bit mode, immediates and displacements are generally only 32 bits wide. NASM will therefore truncate most displacements and immediates to 32 bits. The only instruction which takes a full \i{64-bit immediate} is: \c MOV reg64,imm64 NASM will produce this instruction whenever the programmer uses \c{MOV} with an immediate into a 64-bit register. If this is not desirable, simply specify the equivalent 32-bit register, which will be automatically zero-extended by the processor, or specify the immediate as \c{DWORD}: \c mov rax,foo ; 64-bit immediate \c mov rax,qword foo ; (identical) \c mov eax,foo ; 32-bit immediate, zero-extended \c mov rax,dword foo ; 32-bit immediate, sign-extended The length of these instructions are 10, 5 and 7 bytes, respectively. If optimization is enabled and NASM can determine at assembly time that a shorter instruction will suffice, the shorter instruction will be emitted unless of course \c{STRICT QWORD} or \c{STRICT DWORD} is specified (see \k{strict}): \c mov rax,1 ; Assembles as "mov eax,1" (5 bytes) \c mov rax,strict qword 1 ; Full 10-byte instruction \c mov rax,strict dword 1 ; 7-byte instruction \c mov rax,symbol ; 10 bytes, not known at assembly time \c lea rax,[rel symbol] ; 7 bytes, usually preferred by the ABI Note that \c{lea rax,[rel symbol]} is position-independent, whereas \c{mov rax,symbol} is not. Most ABIs prefer or even require position-independent code in 64-bit mode. However, the \c{MOV} instruction is able to reference a symbol anywhere in the 64-bit address space, whereas \c{LEA} is only able to access a symbol within within 2 GB of the instruction itself (see below.) The only instructions which take a full \I{64-bit displacement}64-bit \e{displacement} is loading or storing, using \c{MOV}, \c{AL}, \c{AX}, \c{EAX} or \c{RAX} (but no other registers) to an absolute 64-bit address. Since this is a relatively rarely used instruction (64-bit code generally uses relative addressing), the programmer has to explicitly declare the displacement size as \c{ABS QWORD}: \c default abs \c \c mov eax,[foo] ; 32-bit absolute disp, sign-extended \c mov eax,[a32 foo] ; 32-bit absolute disp, zero-extended \c mov eax,[qword foo] ; 64-bit absolute disp \c \c default rel \c \c mov eax,[foo] ; 32-bit relative disp \c mov eax,[a32 foo] ; d:o, address truncated to 32 bits(!) \c mov eax,[qword foo] ; error \c mov eax,[abs qword foo] ; 64-bit absolute disp A sign-extended absolute displacement can access from -2 GB to +2 GB; a zero-extended absolute displacement can access from 0 to 4 GB. \H{unix64} Interfacing to 64-bit C Programs (Unix) On Unix, the 64-bit ABI as well as the x32 ABI (32-bit ABI with the CPU in 64-bit mode) is defined by the documents at: \W{https://www.nasm.us/abi/unix64}\c{https://www.nasm.us/abi/unix64} Although written for AT&T-syntax assembly, the concepts apply equally well for NASM-style assembly. What follows is a simplified summary. The first six integer arguments (from the left) are passed in \c{RDI}, \c{RSI}, \c{RDX}, \c{RCX}, \c{R8}, and \c{R9}, in that order. Additional integer arguments are passed on the stack. These registers, plus \c{RAX}, \c{R10} and \c{R11} are destroyed by function calls, and thus are available for use by the function without saving. Integer return values are passed in \c{RAX} and \c{RDX}, in that order. Floating point is done using SSE registers, except for \c{long double}, which is 80 bits (\c{TWORD}) on most platforms (Android is one exception; there \c{long double} is 64 bits and treated the same as \c{double}.) Floating-point arguments are passed in \c{XMM0} to \c{XMM7}; return is \c{XMM0} and \c{XMM1}. \c{long double} are passed on the stack, and returned in \c{ST0} and \c{ST1}. All SSE and x87 registers are destroyed by function calls. On 64-bit Unix, \c{long} is 64 bits. Integer and SSE register arguments are counted separately, so for the case of \c void foo(long a, double b, int c) \c{a} is passed in \c{RDI}, \c{b} in \c{XMM0}, and \c{c} in \c{ESI}. \H{win64} Interfacing to 64-bit C Programs (Win64) The Win64 ABI is described by the document at: \W{https://www.nasm.us/abi/win64}\c{https://www.nasm.us/abi/win64} What follows is a simplified summary. The first four integer arguments are passed in \c{RCX}, \c{RDX}, \c{R8} and \c{R9}, in that order. Additional integer arguments are passed on the stack. These registers, plus \c{RAX}, \c{R10} and \c{R11} are destroyed by function calls, and thus are available for use by the function without saving. Integer return values are passed in \c{RAX} only. Floating point is done using SSE registers, except for \c{long double}. Floating-point arguments are passed in \c{XMM0} to \c{XMM3}; return is \c{XMM0} only. On Win64, \c{long} is 32 bits; \c{long long} or \c{_int64} is 64 bits. Integer and SSE register arguments are counted together, so for the case of \c void foo(long long a, double b, int c) \c{a} is passed in \c{RCX}, \c{b} in \c{XMM1}, and \c{c} in \c{R8D}.