x86/bytecode.txt: improve byte code documentation

Improve the byte code reference documentation to make a few opcodes more clear and add some general properties about the byte codes, including the files that need to be changed when the byte code changes. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-11-08 23:27:15 -05:00 · 2025-10-12 11:23:28 -07:00
parent e9fac2faa6
commit 587ed5e36d
1 changed files with 81 additions and 27 deletions
--- a/x86/bytecode.txt
+++ b/x86/bytecode.txt
@@ -1,3 +1,5 @@
 -*- text -*-
 Bytecode specification
 ----------------------
@@ -9,31 +11,72 @@ hexadecimal.
 The mnemonics are the ones used in x86/insns.dat, where applicable.
 The byte code is not stable. Byte codes can be moved around and
 recycled at any time. x86/insnsb.c contains a generated table of
 byte code use frequencies as a comment near the end that can be
 used to identify candidates for recycling, if necessary.
 Several byte codes are equivalent to sequences of other byte codes; if
 those have low usage counts they can be good candidates for
 recycling.
 Operands are numbered starting with 0.
 Operand numbers encoded in byte codes only encode two bits of the
 operand number, with the opcodes \5, \6 and \7 used as a prefixes to
 escape to operands 4+. This saves a lot of byte coding space, as these
 operands are extremely rare.
 When byte codes are changed, the following files MUST be updated
 accordingly:
 	this file
 	x86/insns.pl	- many locations
 	disasm/disasm.c	- matches()
 	asm/assemble.c	- calcsize(), gencode(), find_match(), jmp_match()
 In x86/insns.dat, the encoding slot of each operand is encoded as:
 	-	implicit operand (no encoding)
 	x+y	multiple encoding slots for one operand
-	r	"r" position in modr/m, or base register with "+r"
+	r	"r" position in modr/m[1], or base register with "+r"[2]
 	m	"m" position in modr/m
-	n	immediate encoded in the "m" position in modr/m
+	n	immediate encoded in the "m" position in modr/m[3]
-	b	register encoded in the "m" position in modr/m
+	b	register encoded in the "m" position in modr/m[4]
 	x	register encoded in the "x" position in modr/m + sib (MIB)
 	v	"v" register position in vex/evex
-	s	"s" registe rposition in /is4
+	s	"s" register position in /is4
-	w	immediate encoded in the "v" position in vex/evex
+	w	immediate encoded in the "v" position in vex/evex[3]
-	i	first immediate or mem_offs
+	i	first immediate or mem_offs[5]
-	j	second immediate or mem_offs
+	j	second immediate or mem_offs[6]
-Codes            Mnemonic        Explanation
+[1] currently used even for register operands, even though "b" is an
    alias in that case.
 [2] this is technically incorrect and should be "b", but that is the
    way it is currently encoded.
 [3] separate letter code for the benefit of the insns.pl sanity checker.
 [4] currently used mainly when "x" is also used.
 [5] when the modr/m displacement is used as an immediate, it is byte
    coded as an *address-sized* immediate and uses "i". A seg:offs
    pair uses "i" for the offset (thus "ji").
 [6] when the modr/m displacement is used as an immediate and
    another ("true") immediate is present, the "true" immediate uses "j".
    A seg:offs pair uses "j" for the segment (thus "ji").
-\0                                       terminates the code. (Unless it's a literal of course.)
+
-\1..\4                                   that many literal bytes follow in the code stream
+XX below indicates a hexadecimal byte; NN a decimal number.
-\5                                       add 4 to the primary operand number (b, low octdigit)
+
-\6                                       add 4 to the secondary operand number (a, middle octdigit)
+Codes            Mnemonic		 Definition
-\7                                       add 4 to both the primary and the secondary operand number
+
-\10..\13                                 a literal byte follows in the code stream, to be added
+\0               (auto-generated)        end of code sequence (but 0 can be part of a multi-byte
                                         sequence, so byte codes are NOT null-terminated strings.)
 \1..\4           XX XX...                that many literal bytes follow in the code stream
 \5               (auto-generated)        add 4 to the primary operand number (b, low octdigit)
 \6               (auto-generated)        add 4 to the secondary operand number (a, middle octdigit)
 \7               (auto-generated)        add 4 to both the primary and the secondary operand number
 \10..\13         +r                      a literal byte follows in the code stream, to be added
                                         to the register value of operand 0..3
-\14..\17                                 the position of index register operand in MIB (BND insns)
+\14..\17         (auto-generated)        the position of index register operand in MIB (BND insns)
 \20..\23         ib                      a byte immediate operand, from operand 0..3
 \24..\27         ib,u                    a zero-extended byte immediate operand, from operand 0..3
 \30..\33         iw                      a word immediate operand, from operand 0..3
@@ -54,17 +97,20 @@ Codes            Mnemonic        Explanation
 \171\mab         /mrb (e.g /3r0)         a ModRM, with the reg field taken from operand a, and the m
                                         and b fields set to the specified values.
 \172\ab          /is4                    the register number from operand a in bits 7..4, with
-                                         the 4-bit immediate from operand b in bits 3..0.
+                                         the 4-bit immediate from operand b in bits 2..0.
-\173\xab                                 the register number from operand a in bits 7..4, with
+					 For EVEX- or REX2-encodable instructions, the operand is encoded in
                                         bits [3:7..4] and the immediate is restricted to 3 bits
 					 unless the register operand is given the rn_l16 operand flag.
 \173\xab         /is4=NN                 the register number from operand a in bits 7..4, with
                                         the value b in bits 3..0.
-\174..\177                               the register number from operand 0..3 in bits 7..4, and
+\174..\177       /is4                    the register number from operand 0..3 in bits 7..4, and
                                         an arbitrary value in bits 3..0 (assembled as zero.)
 \2ab             /b                      a ModRM, calculated on EA in operand a, with the reg
                                         field equal to digit b.
-\240..\243                               this instruction uses EVEX rather than REX or VEX/XOP, with the
+\240..\243       evex.*                  this instruction uses EVEX rather than REX or VEX/XOP, with the
                                         V register number taken from operand "b" (0..3) (which may
 					 be an immediate, as is used for DFV.)
-\250                                     this instruction uses EVEX rather than REX or VEX/XOP, with the
+\250             evex.*                  this instruction uses EVEX rather than REX or VEX/XOP, with the
                                         V register number set to 0 (subject to the XOR as defined
 					 below)
@@ -88,10 +134,10 @@ EVEX prefixes are followed by the sequence:
                   (compressed displacement encoding)
 \254..\257      id,s                     a signed 32-bit operand to be extended to 64 bits.
-\260..\263                               this instruction uses VEX/XOP rather than REX, with the
+\260..\263      vex.*                    this instruction uses VEX/XOP rather than REX, with the
                                         V register taken from operand "b" 0..3.
 \264..\267	id,u			 an unsigned 32-bit operand to be extended to 64 bits.
-\270                                     this instruction uses VEX/XOP rather than REX, with the
+\270            vex.*                    this instruction uses VEX/XOP rather than REX, with the
                                         V register set to 0.
 VEX/XOP prefixes are followed by the sequence:
 \tmm\wlp	tmm format:	tt 0mm mmm
@@ -112,16 +158,20 @@ VEX/XOP prefixes are followed by the sequence:
 t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
-\271             hlex                       instruction takes XRELEASE (F3) with or without lock
+		 vex+.*			     instruction is encodable either with VEX or EVEX,
                                             depending on the operands. Generates multiple
                                             instruction patterns with different operand encoding
                                             and byte codes.
 \271             hlex                        instruction takes XRELEASE (F3) with or without lock
 \272             hlenl                       instruction takes XACQUIRE/XRELEASE with or without lock
 \273             hle                         instruction takes XACQUIRE/XRELEASE with lock only
 \274..\277       ib,s                        a byte immediate operand, from operand 0..3, sign-extended
                                             to the operand size (if o16/o32/o64 present) or the bit size
 \300..\303	 ibn			     a valid 0F NOP opcode.
-\304..\307
+\304..\307				     a byte immediate from operand 0..3, XOR a specific constant.
-	\0\xNN	 ib^NN			     intermediate byte XOR 0xNN
+	\0\xXX	 ib^XX			     intermediate byte XOR 0xXX
-	\1\xNN	 ib,s^NN		     signed intermediate byte XOR 0xNN
+	\1\xXX	 ib,s^XX		     signed intermediate byte XOR 0xXX
-	\2\xNN	 ib,u^NN		     unsigned intermediate byte XOR 0xNN
+	\2\xXX	 ib,u^XX		     unsigned intermediate byte XOR 0xXX
 \310             a16                         indicates fixed 16-bit address size, i.e. optional 0x67.
 \311             a32                         indicates fixed 32-bit address size, i.e. optional 0x67.
 \312             adf, asz                    (disassembler only) invalid with non-default address size.
@@ -185,3 +235,7 @@ t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
 \376             vsibz|vm32z|vm64z           this instruction takes an ZMM VSIB memory EA
 * No 66 prefix is emitted if combined with VEX/EVEX, np, 66, osp or !osp.
 ## Local variables:
 ## fill-column: 99
 ## End