0
0
mirror of https://github.com/netwide-assembler/nasm.git synced 2025-11-08 23:27:15 -05:00

x86/bytecode.txt: improve byte code documentation

Improve the byte code reference documentation to make a few opcodes
more clear and add some general properties about the byte codes,
including the files that need to be changed when the byte code
changes.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
This commit is contained in:
H. Peter Anvin (Intel)
2025-10-12 11:23:28 -07:00
parent e9fac2faa6
commit 587ed5e36d

View File

@@ -1,3 +1,5 @@
-*- text -*-
Bytecode specification Bytecode specification
---------------------- ----------------------
@@ -9,31 +11,72 @@ hexadecimal.
The mnemonics are the ones used in x86/insns.dat, where applicable. The mnemonics are the ones used in x86/insns.dat, where applicable.
The byte code is not stable. Byte codes can be moved around and
recycled at any time. x86/insnsb.c contains a generated table of
byte code use frequencies as a comment near the end that can be
used to identify candidates for recycling, if necessary.
Several byte codes are equivalent to sequences of other byte codes; if
those have low usage counts they can be good candidates for
recycling.
Operands are numbered starting with 0.
Operand numbers encoded in byte codes only encode two bits of the
operand number, with the opcodes \5, \6 and \7 used as a prefixes to
escape to operands 4+. This saves a lot of byte coding space, as these
operands are extremely rare.
When byte codes are changed, the following files MUST be updated
accordingly:
this file
x86/insns.pl - many locations
disasm/disasm.c - matches()
asm/assemble.c - calcsize(), gencode(), find_match(), jmp_match()
In x86/insns.dat, the encoding slot of each operand is encoded as: In x86/insns.dat, the encoding slot of each operand is encoded as:
- implicit operand (no encoding) - implicit operand (no encoding)
x+y multiple encoding slots for one operand x+y multiple encoding slots for one operand
r "r" position in modr/m, or base register with "+r" r "r" position in modr/m[1], or base register with "+r"[2]
m "m" position in modr/m m "m" position in modr/m
n immediate encoded in the "m" position in modr/m n immediate encoded in the "m" position in modr/m[3]
b register encoded in the "m" position in modr/m b register encoded in the "m" position in modr/m[4]
x register encoded in the "x" position in modr/m + sib (MIB) x register encoded in the "x" position in modr/m + sib (MIB)
v "v" register position in vex/evex v "v" register position in vex/evex
s "s" registe rposition in /is4 s "s" register position in /is4
w immediate encoded in the "v" position in vex/evex w immediate encoded in the "v" position in vex/evex[3]
i first immediate or mem_offs i first immediate or mem_offs[5]
j second immediate or mem_offs j second immediate or mem_offs[6]
Codes Mnemonic Explanation [1] currently used even for register operands, even though "b" is an
alias in that case.
[2] this is technically incorrect and should be "b", but that is the
way it is currently encoded.
[3] separate letter code for the benefit of the insns.pl sanity checker.
[4] currently used mainly when "x" is also used.
[5] when the modr/m displacement is used as an immediate, it is byte
coded as an *address-sized* immediate and uses "i". A seg:offs
pair uses "i" for the offset (thus "ji").
[6] when the modr/m displacement is used as an immediate and
another ("true") immediate is present, the "true" immediate uses "j".
A seg:offs pair uses "j" for the segment (thus "ji").
\0 terminates the code. (Unless it's a literal of course.)
\1..\4 that many literal bytes follow in the code stream XX below indicates a hexadecimal byte; NN a decimal number.
\5 add 4 to the primary operand number (b, low octdigit)
\6 add 4 to the secondary operand number (a, middle octdigit) Codes Mnemonic Definition
\7 add 4 to both the primary and the secondary operand number
\10..\13 a literal byte follows in the code stream, to be added \0 (auto-generated) end of code sequence (but 0 can be part of a multi-byte
sequence, so byte codes are NOT null-terminated strings.)
\1..\4 XX XX... that many literal bytes follow in the code stream
\5 (auto-generated) add 4 to the primary operand number (b, low octdigit)
\6 (auto-generated) add 4 to the secondary operand number (a, middle octdigit)
\7 (auto-generated) add 4 to both the primary and the secondary operand number
\10..\13 +r a literal byte follows in the code stream, to be added
to the register value of operand 0..3 to the register value of operand 0..3
\14..\17 the position of index register operand in MIB (BND insns) \14..\17 (auto-generated) the position of index register operand in MIB (BND insns)
\20..\23 ib a byte immediate operand, from operand 0..3 \20..\23 ib a byte immediate operand, from operand 0..3
\24..\27 ib,u a zero-extended byte immediate operand, from operand 0..3 \24..\27 ib,u a zero-extended byte immediate operand, from operand 0..3
\30..\33 iw a word immediate operand, from operand 0..3 \30..\33 iw a word immediate operand, from operand 0..3
@@ -54,17 +97,20 @@ Codes Mnemonic Explanation
\171\mab /mrb (e.g /3r0) a ModRM, with the reg field taken from operand a, and the m \171\mab /mrb (e.g /3r0) a ModRM, with the reg field taken from operand a, and the m
and b fields set to the specified values. and b fields set to the specified values.
\172\ab /is4 the register number from operand a in bits 7..4, with \172\ab /is4 the register number from operand a in bits 7..4, with
the 4-bit immediate from operand b in bits 3..0. the 4-bit immediate from operand b in bits 2..0.
\173\xab the register number from operand a in bits 7..4, with For EVEX- or REX2-encodable instructions, the operand is encoded in
bits [3:7..4] and the immediate is restricted to 3 bits
unless the register operand is given the rn_l16 operand flag.
\173\xab /is4=NN the register number from operand a in bits 7..4, with
the value b in bits 3..0. the value b in bits 3..0.
\174..\177 the register number from operand 0..3 in bits 7..4, and \174..\177 /is4 the register number from operand 0..3 in bits 7..4, and
an arbitrary value in bits 3..0 (assembled as zero.) an arbitrary value in bits 3..0 (assembled as zero.)
\2ab /b a ModRM, calculated on EA in operand a, with the reg \2ab /b a ModRM, calculated on EA in operand a, with the reg
field equal to digit b. field equal to digit b.
\240..\243 this instruction uses EVEX rather than REX or VEX/XOP, with the \240..\243 evex.* this instruction uses EVEX rather than REX or VEX/XOP, with the
V register number taken from operand "b" (0..3) (which may V register number taken from operand "b" (0..3) (which may
be an immediate, as is used for DFV.) be an immediate, as is used for DFV.)
\250 this instruction uses EVEX rather than REX or VEX/XOP, with the \250 evex.* this instruction uses EVEX rather than REX or VEX/XOP, with the
V register number set to 0 (subject to the XOR as defined V register number set to 0 (subject to the XOR as defined
below) below)
@@ -88,10 +134,10 @@ EVEX prefixes are followed by the sequence:
(compressed displacement encoding) (compressed displacement encoding)
\254..\257 id,s a signed 32-bit operand to be extended to 64 bits. \254..\257 id,s a signed 32-bit operand to be extended to 64 bits.
\260..\263 this instruction uses VEX/XOP rather than REX, with the \260..\263 vex.* this instruction uses VEX/XOP rather than REX, with the
V register taken from operand "b" 0..3. V register taken from operand "b" 0..3.
\264..\267 id,u an unsigned 32-bit operand to be extended to 64 bits. \264..\267 id,u an unsigned 32-bit operand to be extended to 64 bits.
\270 this instruction uses VEX/XOP rather than REX, with the \270 vex.* this instruction uses VEX/XOP rather than REX, with the
V register set to 0. V register set to 0.
VEX/XOP prefixes are followed by the sequence: VEX/XOP prefixes are followed by the sequence:
\tmm\wlp tmm format: tt 0mm mmm \tmm\wlp tmm format: tt 0mm mmm
@@ -112,16 +158,20 @@ VEX/XOP prefixes are followed by the sequence:
t = 0 for VEX (C4/C5), t = 1 for XOP (8F). t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
\271 hlex instruction takes XRELEASE (F3) with or without lock vex+.* instruction is encodable either with VEX or EVEX,
depending on the operands. Generates multiple
instruction patterns with different operand encoding
and byte codes.
\271 hlex instruction takes XRELEASE (F3) with or without lock
\272 hlenl instruction takes XACQUIRE/XRELEASE with or without lock \272 hlenl instruction takes XACQUIRE/XRELEASE with or without lock
\273 hle instruction takes XACQUIRE/XRELEASE with lock only \273 hle instruction takes XACQUIRE/XRELEASE with lock only
\274..\277 ib,s a byte immediate operand, from operand 0..3, sign-extended \274..\277 ib,s a byte immediate operand, from operand 0..3, sign-extended
to the operand size (if o16/o32/o64 present) or the bit size to the operand size (if o16/o32/o64 present) or the bit size
\300..\303 ibn a valid 0F NOP opcode. \300..\303 ibn a valid 0F NOP opcode.
\304..\307 \304..\307 a byte immediate from operand 0..3, XOR a specific constant.
\0\xNN ib^NN intermediate byte XOR 0xNN \0\xXX ib^XX intermediate byte XOR 0xXX
\1\xNN ib,s^NN signed intermediate byte XOR 0xNN \1\xXX ib,s^XX signed intermediate byte XOR 0xXX
\2\xNN ib,u^NN unsigned intermediate byte XOR 0xNN \2\xXX ib,u^XX unsigned intermediate byte XOR 0xXX
\310 a16 indicates fixed 16-bit address size, i.e. optional 0x67. \310 a16 indicates fixed 16-bit address size, i.e. optional 0x67.
\311 a32 indicates fixed 32-bit address size, i.e. optional 0x67. \311 a32 indicates fixed 32-bit address size, i.e. optional 0x67.
\312 adf, asz (disassembler only) invalid with non-default address size. \312 adf, asz (disassembler only) invalid with non-default address size.
@@ -185,3 +235,7 @@ t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
\376 vsibz|vm32z|vm64z this instruction takes an ZMM VSIB memory EA \376 vsibz|vm32z|vm64z this instruction takes an ZMM VSIB memory EA
* No 66 prefix is emitted if combined with VEX/EVEX, np, 66, osp or !osp. * No 66 prefix is emitted if combined with VEX/EVEX, np, 66, osp or !osp.
## Local variables:
## fill-column: 99
## End