Inline Assembler
D, being a systems programming language, provides an inline assembler. The inline assembler is standardized for D implementations across the same CPU family, for example, the Intel Pentium inline assembler for a Win32 D compiler will be syntax compatible with the inline assembler for Linux running on an Intel Pentium.
Implementations of D on different architectures, however, are free to innovate upon the memory model, function call/return conventions, argument passing conventions, etc.
This document describes the x86 and x86_64 implementations of the inline assembler. The inline assembler platform support that a compiler provides is indicated by the D_InlineAsm_X86 and D_InlineAsm_X86_64 version identifiers, respectively.
Asm statement
AsmStatement: asm FunctionAttributesopt { AsmInstructionListopt } AsmInstructionList: AsmInstruction ; AsmInstruction ; AsmInstructionList
Assembler instructions must be located inside an asm block. Like functions, asm statements must be anotated with adequate function attributes to be compatible with the caller. Asm statements attributes must be explicitly defined, they are not infered.
void func1() pure nothrow @safe @nogc { asm pure nothrow @trusted @nogc {} } void func2() @safe @nogc { asm @nogc // Error: asm statement is assumed to be @system - mark it with '@trusted' if it is not {} }
Asm instruction
AsmInstruction: Identifier : AsmInstruction align IntegerExpression even naked db Operands ds Operands di Operands dl Operands df Operands dd Operands de Operands db StringLiteral ds StringLiteral di StringLiteral dl StringLiteral dw StringLiteral dq StringLiteral Opcode Opcode Operands Operands: Operand Operand , Operands
Labels
Assembler instructions can be labeled just like other statements. They can be the target of goto statements. For example:
void *pc; asm { call L1 ; L1: ; pop EBX ; mov pc[EBP],EBX ; // pc now points to code at L1 }
align IntegerExpression
IntegerExpression: IntegerLiteral Identifier
Causes the assembler to emit NOP instructions to align the next assembler instruction on an IntegerExpression boundary. IntegerExpression must evaluate at compile time to an integer that is a power of 2.
Aligning the start of a loop body can sometimes have a dramatic effect on the execution speed.
even
Causes the assembler to emit NOP instructions to align the next assembler instruction on an even boundary.
naked
Causes the compiler to not generate the function prolog and epilog sequences. This means such is the responsibility of inline assembly programmer, and is normally used when the entire function is to be written in assembler.
db, ds, di, dl, df, dd, de
These pseudo ops are for inserting raw data directly into the code. db is for bytes, ds is for 16 bit words, di is for 32 bit words, dl is for 64 bit words, df is for 32 bit floats, dd is for 64 bit doubles, and de is for 80 bit extended reals. Each can have multiple operands. If an operand is a string literal, it is as if there were length operands, where length is the number of characters in the string. One character is used per operand. For example:
asm { db 5,6,0x83; // insert bytes 0x05, 0x06, and 0x83 into code ds 0x1234; // insert bytes 0x34, 0x12 di 0x1234; // insert bytes 0x34, 0x12, 0x00, 0x00 dl 0x1234; // insert bytes 0x34, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 df 1.234; // insert float 1.234 dd 1.234; // insert double 1.234 de 1.234; // insert real 1.234 db "abc"; // insert bytes 0x61, 0x62, and 0x63 ds "abc"; // insert bytes 0x61, 0x00, 0x62, 0x00, 0x63, 0x00 }
Opcodes
A list of supported opcodes is at the end.
The following registers are supported. Register names are always in upper case.
Register: AL AH AX EAX BL BH BX EBX CL CH CX ECX DL DH DX EDX BP EBP SP ESP DI EDI SI ESI ES CS SS DS GS FS CR0 CR2 CR3 CR4 DR0 DR1 DR2 DR3 DR6 DR7 TR3 TR4 TR5 TR6 TR7 ST ST(0) ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7) MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7
x86_64 adds these additional registers.
Register64: RAX RBX RCX RDX BPL RBP SPL RSP DIL RDI SIL RSI R8B R8W R8D R8 R9B R9W R9D R9 R10B R10W R10D R10 R11B R11W R11D R11 R12B R12W R12D R12 R13B R13W R13D R13 R14B R14W R14D R14 R15B R15W R15D R15 XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15 YMM0 YMM1 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7 YMM8 YMM9 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15
Special Cases
- lock, rep, repe, repne, repnz, repz
- These prefix instructions do not appear in the same statement as the instructions they prefix; they appear in their own statement. For example:
- pause
- This opcode is not supported by the assembler, instead use
asm { rep ; nop ; }
which produces the same result. - floating point ops
- Use the two operand form of the instruction format;
fdiv ST(1); // wrong fmul ST; // wrong fdiv ST,ST(1); // right fmul ST,ST(0); // right
asm
{
rep ;
movsb ;
}
Operands
Operand: AsmExp AsmExp: AsmLogOrExp AsmLogOrExp ? AsmExp : AsmExp AsmLogOrExp: AsmLogAndExp AsmLogOrExp || AsmLogAndExp AsmLogAndExp: AsmOrExp AsmLogAndExp && AsmOrExp AsmOrExp: AsmXorExp AsmOrExp | AsmXorExp AsmXorExp: AsmAndExp AsmXorExp ^ AsmAndExp AsmAndExp: AsmEqualExp AsmAndExp & AsmEqualExp AsmEqualExp: AsmRelExp AsmEqualExp == AsmRelExp AsmEqualExp != AsmRelExp AsmRelExp: AsmShiftExp AsmRelExp < AsmShiftExp AsmRelExp <= AsmShiftExp AsmRelExp > AsmShiftExp AsmRelExp >= AsmShiftExp AsmShiftExp: AsmAddExp AsmShiftExp << AsmAddExp AsmShiftExp >> AsmAddExp AsmShiftExp >>> AsmAddExp AsmAddExp: AsmMulExp AsmAddExp + AsmMulExp AsmAddExp - AsmMulExp AsmMulExp: AsmBrExp AsmMulExp * AsmBrExp AsmMulExp / AsmBrExp AsmMulExp % AsmBrExp AsmBrExp: AsmUnaExp AsmBrExp [ AsmExp ] AsmUnaExp: AsmTypePrefix AsmExp offsetof AsmExp seg AsmExp + AsmUnaExp - AsmUnaExp ! AsmUnaExp ~ AsmUnaExp AsmPrimaryExp AsmPrimaryExp: IntegerLiteral FloatLiteral __LOCAL_SIZE $ Register Register : AsmExp Register64 Register64 : AsmExp DotIdentifier this DotIdentifier: Identifier Identifier . DotIdentifier FundamentalType . Identifier
The operand syntax more or less follows the Intel CPU documentation conventions. In particular, the convention is that for two operand instructions the source is the right operand and the destination is the left operand. The syntax differs from that of Intel's in order to be compatible with the D language tokenizer and to simplify parsing.
The seg means load the segment number that the symbol is in. This is not relevant for flat model code. Instead, do a move from the relevant segment register.
A dotted expression is evaluated during the compilation and then must either give a constant or indicate a higher level variable that fits in the target register or variable.
Operand Types
AsmTypePrefix: near ptr far ptr word ptr dword ptr qword ptr FundamentalType ptr
In cases where the operand size is ambiguous, as in:
add [EAX],3 ;it can be disambiguated by using an AsmTypePrefix:
add byte ptr [EAX],3 ; add int ptr [EAX],7 ;
far ptr is not relevant for flat model code.
Struct/Union/Class Member Offsets
To access members of an aggregate, given a pointer to the aggregate is in a register, use the .offsetof property of the qualified name of the member:
struct Foo { int a,b,c; } int bar(Foo *f) { asm { mov EBX,f ; mov EAX,Foo.b.offsetof[EBX] ; } } void main() { Foo f = Foo(0, 2, 0); assert(bar(&f) == 2); }
Alternatively, inside the scope of an aggregate, only the member name is needed:
struct Foo // or class { int a,b,c; int bar() { asm { mov EBX, this ; mov EAX, b[EBX] ; } } } void main() { Foo f = Foo(0, 2, 0); assert(f.bar() == 2); }
Stack Variables
Stack variables (variables local to a function and allocated on the stack) are accessed via the name of the variable indexed by EBP:
int foo(int x) { asm { mov EAX,x[EBP] ; // loads value of parameter x into EAX mov EAX,x ; // does the same thing } }
If the [EBP] is omitted, it is assumed for local variables. If naked is used, this no longer holds.
Special Symbols
- $
- Represents the program counter of the start of the next instruction. So,
jmp $ ;
branches to the instruction following the jmp instruction. The $ can only appear as the target of a jmp or call instruction. - __LOCAL_SIZE
- This gets replaced by the number of local bytes in the local stack frame. It is most handy when the naked is invoked and a custom stack frame is programmed.
Opcodes Supported
aaa | aad | aam | aas | adc |
add | addpd | addps | addsd | addss |
and | andnpd | andnps | andpd | andps |
arpl | bound | bsf | bsr | bswap |
bt | btc | btr | bts | call |
cbw | cdq | clc | cld | clflush |
cli | clts | cmc | cmova | cmovae |
cmovb | cmovbe | cmovc | cmove | cmovg |
cmovge | cmovl | cmovle | cmovna | cmovnae |
cmovnb | cmovnbe | cmovnc | cmovne | cmovng |
cmovnge | cmovnl | cmovnle | cmovno | cmovnp |
cmovns | cmovnz | cmovo | cmovp | cmovpe |
cmovpo | cmovs | cmovz | cmp | cmppd |
cmpps | cmps | cmpsb | cmpsd | cmpss |
cmpsw | cmpxchg | cmpxchg8b | cmpxchg16b | |
comisd | comiss | |||
cpuid | cvtdq2pd | cvtdq2ps | cvtpd2dq | cvtpd2pi |
cvtpd2ps | cvtpi2pd | cvtpi2ps | cvtps2dq | cvtps2pd |
cvtps2pi | cvtsd2si | cvtsd2ss | cvtsi2sd | cvtsi2ss |
cvtss2sd | cvtss2si | cvttpd2dq | cvttpd2pi | cvttps2dq |
cvttps2pi | cvttsd2si | cvttss2si | cwd | cwde |
da | daa | das | db | dd |
de | dec | df | di | div |
divpd | divps | divsd | divss | dl |
dq | ds | dt | dw | emms |
enter | f2xm1 | fabs | fadd | faddp |
fbld | fbstp | fchs | fclex | fcmovb |
fcmovbe | fcmove | fcmovnb | fcmovnbe | fcmovne |
fcmovnu | fcmovu | fcom | fcomi | fcomip |
fcomp | fcompp | fcos | fdecstp | fdisi |
fdiv | fdivp | fdivr | fdivrp | feni |
ffree | fiadd | ficom | ficomp | fidiv |
fidivr | fild | fimul | fincstp | finit |
fist | fistp | fisub | fisubr | fld |
fld1 | fldcw | fldenv | fldl2e | fldl2t |
fldlg2 | fldln2 | fldpi | fldz | fmul |
fmulp | fnclex | fndisi | fneni | fninit |
fnop | fnsave | fnstcw | fnstenv | fnstsw |
fpatan | fprem | fprem1 | fptan | frndint |
frstor | fsave | fscale | fsetpm | fsin |
fsincos | fsqrt | fst | fstcw | fstenv |
fstp | fstsw | fsub | fsubp | fsubr |
fsubrp | ftst | fucom | fucomi | fucomip |
fucomp | fucompp | fwait | fxam | fxch |
fxrstor | fxsave | fxtract | fyl2x | fyl2xp1 |
hlt | idiv | imul | in | inc |
ins | insb | insd | insw | int |
into | invd | invlpg | iret | iretd |
iretq | ja | jae | jb | jbe |
jc | jcxz | je | jecxz | jg |
jge | jl | jle | jmp | jna |
jnae | jnb | jnbe | jnc | jne |
jng | jnge | jnl | jnle | jno |
jnp | jns | jnz | jo | jp |
jpe | jpo | js | jz | lahf |
lar | ldmxcsr | lds | lea | leave |
les | lfence | lfs | lgdt | lgs |
lidt | lldt | lmsw | lock | lods |
lodsb | lodsd | lodsw | loop | loope |
loopne | loopnz | loopz | lsl | lss |
ltr | maskmovdqu | maskmovq | maxpd | maxps |
maxsd | maxss | mfence | minpd | minps |
minsd | minss | mov | movapd | movaps |
movd | movdq2q | movdqa | movdqu | movhlps |
movhpd | movhps | movlhps | movlpd | movlps |
movmskpd | movmskps | movntdq | movnti | movntpd |
movntps | movntq | movq | movq2dq | movs |
movsb | movsd | movss | movsw | movsx |
movupd | movups | movzx | mul | mulpd |
mulps | mulsd | mulss | neg | nop |
not | or | orpd | orps | out |
outs | outsb | outsd | outsw | packssdw |
packsswb | packuswb | paddb | paddd | paddq |
paddsb | paddsw | paddusb | paddusw | paddw |
pand | pandn | pavgb | pavgw | pcmpeqb |
pcmpeqd | pcmpeqw | pcmpgtb | pcmpgtd | pcmpgtw |
pextrw | pinsrw | pmaddwd | pmaxsw | pmaxub |
pminsw | pminub | pmovmskb | pmulhuw | pmulhw |
pmullw | pmuludq | pop | popa | popad |
popf | popfd | por | prefetchnta | prefetcht0 |
prefetcht1 | prefetcht2 | psadbw | pshufd | pshufhw |
pshuflw | pshufw | pslld | pslldq | psllq |
psllw | psrad | psraw | psrld | psrldq |
psrlq | psrlw | psubb | psubd | psubq |
psubsb | psubsw | psubusb | psubusw | psubw |
punpckhbw | punpckhdq | punpckhqdq | punpckhwd | punpcklbw |
punpckldq | punpcklqdq | punpcklwd | push | pusha |
pushad | pushf | pushfd | pxor | rcl |
rcpps | rcpss | rcr | rdmsr | rdpmc |
rdtsc | rep | repe | repne | repnz |
repz | ret | retf | rol | ror |
rsm | rsqrtps | rsqrtss | sahf | sal |
sar | sbb | scas | scasb | scasd |
scasw | seta | setae | setb | setbe |
setc | sete | setg | setge | setl |
setle | setna | setnae | setnb | setnbe |
setnc | setne | setng | setnge | setnl |
setnle | setno | setnp | setns | setnz |
seto | setp | setpe | setpo | sets |
setz | sfence | sgdt | shl | shld |
shr | shrd | shufpd | shufps | sidt |
sldt | smsw | sqrtpd | sqrtps | sqrtsd |
sqrtss | stc | std | sti | stmxcsr |
stos | stosb | stosd | stosw | str |
sub | subpd | subps | subsd | subss |
syscall | sysenter | sysexit | sysret | test |
ucomisd | ucomiss | ud2 | unpckhpd | unpckhps |
unpcklpd | unpcklps | verr | verw | wait |
wbinvd | wrmsr | xadd | xchg | xlat |
xlatb | xor | xorpd | xorps |
Pentium 4 (Prescott) Opcodes Supported
addsubpd | addsubps | fisttp | haddpd | haddps |
hsubpd | hsubps | lddqu | monitor | movddup |
movshdup | movsldup | mwait |
AMD Opcodes Supported
pavgusb | pf2id | pfacc | pfadd | pfcmpeq |
pfcmpge | pfcmpgt | pfmax | pfmin | pfmul |
pfnacc | pfpnacc | pfrcp | pfrcpit1 | pfrcpit2 |
pfrsqit1 | pfrsqrt | pfsub | pfsubr | pi2fd |
pmulhrw | pswapd |
SIMD
SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AVX are supported.