mirror of
https://github.com/netwide-assembler/nasm.git
synced 2025-11-08 23:27:15 -05:00
Improve the byte code reference documentation to make a few opcodes more clear and add some general properties about the byte codes, including the files that need to be changed when the byte code changes. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
242 lines
14 KiB
Plaintext
242 lines
14 KiB
Plaintext
-*- text -*-
|
|
|
|
Bytecode specification
|
|
----------------------
|
|
|
|
These are the bytecodes generated by x86/insn.pl into x86/insnsb.c
|
|
and consumed by asm/assemble.c and disasm/disasm.c.
|
|
|
|
Values prefixed with \ are in octal, values prefixed with \x are in
|
|
hexadecimal.
|
|
|
|
The mnemonics are the ones used in x86/insns.dat, where applicable.
|
|
|
|
The byte code is not stable. Byte codes can be moved around and
|
|
recycled at any time. x86/insnsb.c contains a generated table of
|
|
byte code use frequencies as a comment near the end that can be
|
|
used to identify candidates for recycling, if necessary.
|
|
|
|
Several byte codes are equivalent to sequences of other byte codes; if
|
|
those have low usage counts they can be good candidates for
|
|
recycling.
|
|
|
|
Operands are numbered starting with 0.
|
|
|
|
Operand numbers encoded in byte codes only encode two bits of the
|
|
operand number, with the opcodes \5, \6 and \7 used as a prefixes to
|
|
escape to operands 4+. This saves a lot of byte coding space, as these
|
|
operands are extremely rare.
|
|
|
|
When byte codes are changed, the following files MUST be updated
|
|
accordingly:
|
|
|
|
this file
|
|
x86/insns.pl - many locations
|
|
disasm/disasm.c - matches()
|
|
asm/assemble.c - calcsize(), gencode(), find_match(), jmp_match()
|
|
|
|
In x86/insns.dat, the encoding slot of each operand is encoded as:
|
|
|
|
- implicit operand (no encoding)
|
|
x+y multiple encoding slots for one operand
|
|
r "r" position in modr/m[1], or base register with "+r"[2]
|
|
m "m" position in modr/m
|
|
n immediate encoded in the "m" position in modr/m[3]
|
|
b register encoded in the "m" position in modr/m[4]
|
|
x register encoded in the "x" position in modr/m + sib (MIB)
|
|
v "v" register position in vex/evex
|
|
s "s" register position in /is4
|
|
w immediate encoded in the "v" position in vex/evex[3]
|
|
i first immediate or mem_offs[5]
|
|
j second immediate or mem_offs[6]
|
|
|
|
[1] currently used even for register operands, even though "b" is an
|
|
alias in that case.
|
|
[2] this is technically incorrect and should be "b", but that is the
|
|
way it is currently encoded.
|
|
[3] separate letter code for the benefit of the insns.pl sanity checker.
|
|
[4] currently used mainly when "x" is also used.
|
|
[5] when the modr/m displacement is used as an immediate, it is byte
|
|
coded as an *address-sized* immediate and uses "i". A seg:offs
|
|
pair uses "i" for the offset (thus "ji").
|
|
[6] when the modr/m displacement is used as an immediate and
|
|
another ("true") immediate is present, the "true" immediate uses "j".
|
|
A seg:offs pair uses "j" for the segment (thus "ji").
|
|
|
|
|
|
XX below indicates a hexadecimal byte; NN a decimal number.
|
|
|
|
Codes Mnemonic Definition
|
|
|
|
\0 (auto-generated) end of code sequence (but 0 can be part of a multi-byte
|
|
sequence, so byte codes are NOT null-terminated strings.)
|
|
\1..\4 XX XX... that many literal bytes follow in the code stream
|
|
\5 (auto-generated) add 4 to the primary operand number (b, low octdigit)
|
|
\6 (auto-generated) add 4 to the secondary operand number (a, middle octdigit)
|
|
\7 (auto-generated) add 4 to both the primary and the secondary operand number
|
|
\10..\13 +r a literal byte follows in the code stream, to be added
|
|
to the register value of operand 0..3
|
|
\14..\17 (auto-generated) the position of index register operand in MIB (BND insns)
|
|
\20..\23 ib a byte immediate operand, from operand 0..3
|
|
\24..\27 ib,u a zero-extended byte immediate operand, from operand 0..3
|
|
\30..\33 iw a word immediate operand, from operand 0..3
|
|
\34..\37 iwd select between \3[0-3] and \4[0-3]
|
|
depending on the *operand* size of the instruction.
|
|
\40..\43 id a long immediate operand, from operand 0..3
|
|
\44..\47 iwdq select between \3[0-3], \4[0-3] and \5[4-7]
|
|
depending on the *address* size of the instruction.
|
|
\50..\53 rel8 a byte relative operand, from operand 0..3
|
|
\54..\57 iq a qword immediate operand, from operand 0..3
|
|
\60..\63 rel16 a word relative operand, from operand 0..3
|
|
\64..\67 rel select between \6[0-3] and \7[0-3] depending on 16/32 bit
|
|
assembly mode or the operand-size override on the operand
|
|
\70..\73 rel32 a long relative operand, from operand 0..3
|
|
\74..\77 seg a word constant, from the _segment_ part of operand 0..3
|
|
\1ab /r a ModRM, calculated on EA in operand a, with the reg
|
|
field the register value of operand b.
|
|
\171\mab /mrb (e.g /3r0) a ModRM, with the reg field taken from operand a, and the m
|
|
and b fields set to the specified values.
|
|
\172\ab /is4 the register number from operand a in bits 7..4, with
|
|
the 4-bit immediate from operand b in bits 2..0.
|
|
For EVEX- or REX2-encodable instructions, the operand is encoded in
|
|
bits [3:7..4] and the immediate is restricted to 3 bits
|
|
unless the register operand is given the rn_l16 operand flag.
|
|
\173\xab /is4=NN the register number from operand a in bits 7..4, with
|
|
the value b in bits 3..0.
|
|
\174..\177 /is4 the register number from operand 0..3 in bits 7..4, and
|
|
an arbitrary value in bits 3..0 (assembled as zero.)
|
|
\2ab /b a ModRM, calculated on EA in operand a, with the reg
|
|
field equal to digit b.
|
|
\240..\243 evex.* this instruction uses EVEX rather than REX or VEX/XOP, with the
|
|
V register number taken from operand "b" (0..3) (which may
|
|
be an immediate, as is used for DFV.)
|
|
\250 evex.* this instruction uses EVEX rather than REX or VEX/XOP, with the
|
|
V register number set to 0 (subject to the XOR as defined
|
|
below)
|
|
|
|
EVEX prefixes are followed by the sequence:
|
|
|
|
\p1\p2\p3\3tt
|
|
|
|
... which are XOR'd into the EVEX payload bytes. These are used to encode the
|
|
map and other fixed fields. Note that the inverted register bits should be
|
|
set to 1.
|
|
|
|
These are used in conjunction with the following instruction flags:
|
|
|
|
IF_LIG hint to the disassembler: ignore EVEX.L
|
|
IF_WIG hint to the disassembler: ignore EVEX.W
|
|
IF_WW W is used as REX_W
|
|
IF_NF the {nf} prefix is permitted for this instruction
|
|
IF_DFV this instruction uses the V field as DFV
|
|
|
|
tt is tuple type for Disp8*N from %tuple_codes in insns.pl
|
|
(compressed displacement encoding)
|
|
|
|
\254..\257 id,s a signed 32-bit operand to be extended to 64 bits.
|
|
\260..\263 vex.* this instruction uses VEX/XOP rather than REX, with the
|
|
V register taken from operand "b" 0..3.
|
|
\264..\267 id,u an unsigned 32-bit operand to be extended to 64 bits.
|
|
\270 vex.* this instruction uses VEX/XOP rather than REX, with the
|
|
V register set to 0.
|
|
VEX/XOP prefixes are followed by the sequence:
|
|
\tmm\wlp tmm format: tt 0mm mmm
|
|
[vex] tt = 0
|
|
[xop] tt = 1
|
|
|
|
mmmmm = M field
|
|
|
|
wlp format: w0 00l lpp
|
|
[l0] ll = 0 for L = 0 (.128, .lz)
|
|
[l1] ll = 1 for L = 1 (.256)
|
|
[lig] ll = 0 for L don't care (always assembled as 0) with IF_LIG
|
|
|
|
[w0] w = 0 for W = 0
|
|
[w1 ] w = 1 for W = 1
|
|
[wig] w = 0 for W don't care (always assembled as 0) with IF_WIG
|
|
[ww] w = 0 for W used as REX.W with IF_WW
|
|
|
|
t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
|
|
|
|
vex+.* instruction is encodable either with VEX or EVEX,
|
|
depending on the operands. Generates multiple
|
|
instruction patterns with different operand encoding
|
|
and byte codes.
|
|
\271 hlex instruction takes XRELEASE (F3) with or without lock
|
|
\272 hlenl instruction takes XACQUIRE/XRELEASE with or without lock
|
|
\273 hle instruction takes XACQUIRE/XRELEASE with lock only
|
|
\274..\277 ib,s a byte immediate operand, from operand 0..3, sign-extended
|
|
to the operand size (if o16/o32/o64 present) or the bit size
|
|
\300..\303 ibn a valid 0F NOP opcode.
|
|
\304..\307 a byte immediate from operand 0..3, XOR a specific constant.
|
|
\0\xXX ib^XX intermediate byte XOR 0xXX
|
|
\1\xXX ib,s^XX signed intermediate byte XOR 0xXX
|
|
\2\xXX ib,u^XX unsigned intermediate byte XOR 0xXX
|
|
\310 a16 indicates fixed 16-bit address size, i.e. optional 0x67.
|
|
\311 a32 indicates fixed 32-bit address size, i.e. optional 0x67.
|
|
\312 adf, asz (disassembler only) invalid with non-default address size.
|
|
\313 a64 indicates fixed 64-bit address size, 0x67 invalid.
|
|
\314 norexb (disassembler only) invalid with REX.B
|
|
\315 norexx (disassembler only) invalid with REX.X
|
|
\316 norexr (disassembler only) invalid with REX.R
|
|
\317 norexw (disassembler only) invalid with REX.W
|
|
- o8 generates no byte code; for orthogonality.
|
|
\320 o16* indicates fixed 16-bit operand size, i.e. optional 0x66.
|
|
\321 o32* indicates fixed 32-bit operand size, i.e. optional 0x66.
|
|
\322 odf indicates that this instruction is only valid when the
|
|
operand size is the default (instruction to disassembler,
|
|
generates no code in the assembler)
|
|
\323 o64nw indicates fixed 64-bit operand size (equivalent to nw o64)
|
|
\324 o64 indicates 64-bit operand size requiring REX.W.
|
|
\325 nohi instruction which always uses spl/bpl/sil/dil
|
|
\326 nof3 (disassembler only) not valid with 0xF3 REP prefix.
|
|
\327 nw indicates that the operand size defaults to 64 in 64-bit mode;
|
|
REX.W is not required. As a side effect, 32-bit operand size is
|
|
not available. If followed by an oxx code, this has the
|
|
following effects:
|
|
o16 - 66 prefix generated in 32- or 64-bit mode.
|
|
o32 - 66 prefix generated in 16-bit mode; treated as o64 in 64-bit mode.
|
|
o64 - only permitted in 64-bit mode, does not set REX.W unless combined
|
|
with code rex.w (\347).
|
|
\330 osz default or user-specified operand size
|
|
\331 norep not valid with 0xF2 or 0xF3 REP prefixes.
|
|
\332 f2 REP prefix (0xF2 byte) used as opcode extension.
|
|
\333 f3 REP prefix (0xF3 byte) used as opcode extension.
|
|
\334 rex.l LOCK prefix used as REX.R in 16/32-bit mode.
|
|
\335 repe disassemble a rep (0xF3 byte) prefix as repe not rep.
|
|
\336 optw 16-, 32- and 64-bit operation identical; allow optimization.
|
|
\337 optd 32- and 64-bit operation identical; allow optimization.
|
|
\340 resb reserve <operand 0> bytes of uninitialized storage.
|
|
Operand 0 had better be a segmentless constant.
|
|
\341 wait this instruction needs a WAIT "prefix"
|
|
\342 osm o16, o32 or o64 matching bit mode
|
|
\343 osd o16, o32 or o32 matching bit mode
|
|
\344 rex.b REX[2].B used as an opcode extension.
|
|
\345 rex.x REX[2].X used as an opcode extension.
|
|
\346 rex.r REX[2].R used as an opcode extension.
|
|
\347 rex.w REX[2].W used as an opcode extension.
|
|
\350..\351 rex2[.w] obligatory REX2 prefix, rex2.w = rex.w rex2
|
|
\355..\357 m[1-3] 0f 0f38 0f3a Set the legacy map number. Unless a REX2, VEX, or EVEX prefix
|
|
is also generated, these are emitted as legacy prefix bytes.
|
|
- m0 Generates no byte code, but can be used to indicate that
|
|
following bytes are literal and not part of a prefix.
|
|
\360 np no SSE prefix (== \364\331)
|
|
\361 66 66 SSE prefix (== \366\331)
|
|
\364 !osp operand-size prefix (0x66) not permitted
|
|
\365 !asp address-size prefix (0x67) not permitted
|
|
\366 osp operand-size prefix (0x66) used as opcode extension
|
|
\367 67 address-size prefix (0x67) used as opcode extension
|
|
\370,\371 jcc8 match only if operand 0 meets byte jump criteria.
|
|
jmp8 370 is used for Jcc, 371 is used for JMP.
|
|
\373 jlen assemble 0x03 if bits==16, 0x05 if bits==32;
|
|
used for conditional jump over longer jump
|
|
\374 vsibx|vm32x|vm64x this instruction takes an XMM VSIB memory EA
|
|
\375 vsiby|vm32y|vm64y this instruction takes an YMM VSIB memory EA
|
|
\376 vsibz|vm32z|vm64z this instruction takes an ZMM VSIB memory EA
|
|
|
|
* No 66 prefix is emitted if combined with VEX/EVEX, np, 66, osp or !osp.
|
|
|
|
## Local variables:
|
|
## fill-column: 99
|
|
## End
|