I’ve tried to create a case where the x86-64 assembly would be simple enough to follow:
subroutine doit(f)
external f
call f
end subroutine
integer function timestwo (n) ! returns 2*n
implicit none
integer, intent(in) :: n
external :: doit
call doit(multiply_by_2)
contains
subroutine multiply_by_2()
timestwo = 2*n
end subroutine
end function
When compiled using gfortran -Os -fno-inline
this produces (Compiler Explorer),
multiply_by_2.0:
mov rax, QWORD PTR [r10] ; load the context address
mov eax, DWORD PTR [rax] ; load the value of n from the context
add eax, eax ; 2*n = n + n
mov DWORD PTR [r10+8], eax ; store result back in context
ret
doit_:
xor eax, eax ; void result
jmp rdi ; jump to procedure f
timestwo_:
sub rsp, 56 ; reserve 56 bytes on the stack
xor eax, eax ;
mov edx, OFFSET FLAT:multiply_by_2.0 ; take address (offset) of the internal procedure
mov QWORD PTR [rsp], rdi ; store address of n on the stack
lea rdi, [rsp+12] ; load (effective) callback address (to be passed to doit)
mov QWORD PTR [rsp+40], rax ; trampoline?
mov WORD PTR [rsp+12], -17599 ; trampoline?
mov DWORD PTR [rsp+14], edx ; trampoline?
mov WORD PTR [rsp+18], -17847 ; trampoline?
mov QWORD PTR [rsp+20], rsp ; trampoline?
mov DWORD PTR [rsp+28], -1864106167 ; trampoline?
call doit_ ; doit takes a single argument passed in rdi
mov eax, DWORD PTR [rsp+8] ; retrieve result from the stack (result is in eax)
add rsp, 56 ; release reserved stack area
ret
I’ve tried to annotate this to my best understanding. Their is a group of six mov
instructions which appear to set up the trampoline in the current stack frame:
[rsp + 0] 8 bytes address of n
[rsp + 8] 4 bytes area to return 2*n
[rsp + 12] 2 bytes magic value: -17599 <-- this address is passed to doit
[rsp + 14] 4 bytes callback offset
[rsp + 18] 2 bytes magic value: -17487
[rsp + 20] 8 bytes current stack pointer
[rsp + 28] 4 bytes magic value: -1864106167
[rsp + 40] 8 bytes zero (?)
I think you can see the trampoline here, because instead of passing the address of multiply_by_2
directly, the address of the trampoline is passed instead. What I’m missing is at some point there should be an instruction like lea r10, [rsp]
to connect the nested procedure to the context (lea
stands for load effective address), which is then used by the internal procedure.
Anyways, the reason it would be called an executable stack is because the data stored in the current stack frame between [rsp + 12]
and [rsp + 28]
is treated as code (i.e. instructions).
Disclaimer: perhaps this is not the trampoline, but just a struct describing a nested function. I would appreciate if anyone can double check this.