Where Is Argument 7 Stored Registers

Calling convention

A calling convention governs how functions on a item compages and operating organization interact. This includes rules about includes how function arguments are placed, where return values get, what registers functions may apply, how they may allocate local variables, and so along. Calling conventions ensure that functions compiled past dissimilar compilers tin can interoperate, and they ensure that operating systems can run code from different programming languages and compilers. Some aspects of a calling convention are derived from the teaching prepare itself, but some are conventional, significant decided upon by people (for case, at a convention).

Calling conventions constrain both callers and callees. A caller is a function that calls another part; a callee is a office that was called. The currently-executing part is a callee, merely not a caller.

For concreteness, we acquire the x86-64 calling conventions for Linux. These conventions are shared past many OSes, including MacOS (simply not Windows), and are officially called the "Arrangement V AMD64 ABI."

The official specification: AMD64 ABI

Argument passing and stack frames

One gear up of calling convention rules governs how function arguments and return values are passed. On x86-64 Linux, the first half dozen office arguments are passed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9, respectively. The 7th and subsequent arguments are passed on the stack, about which more below. The return value is passed in annals %rax.

The full rules more circuitous than this. You can read them in the AMD64 ABI, section three.2.3, but they're quite detailed. Some highlights:

A construction statement that fits in a single car word (64 bits/8 bytes) is passed in a single register.

Example: struct small { char a1, a2; }
A structure that fits in two to four machine words (xvi–32 bytes) is passed in sequential registers, every bit if information technology were multiple arguments.

Example: struct medium { long a1, a2; }
A structure that's larger than iv machine words is always passed on the stack.

Example: struct big { long a, b, c, d, east, f, 1000; }
Floating point arguments are by and large passed in special registers, the "SSE registers," that we don't discuss further.
If the return value takes more than eight bytes, then the caller reserves space for the render value, and passes the address of that space as the beginning argument of the part. The callee will fill in that space when it returns.

Writing minor programs to demonstrate these rules is a pleasant exercise; for example:

                          struct              small              {              char              a1, a2; };              int              f(minor s) {              return              s.a1              +              2              *              due south.a2; }

compiles to:

                          movl              %edi,              %eax              # copy argument to %eax                                          movsbl              %dil,              %edi              # %edi := sign-extension of lowest byte of statement (s.a1)                                          movsbl              %ah,              %eax              # %eax := sign-extension of 2nd byte of statement (southward.a2)                                          movsbl              %al,              %eax              leal              (%rdi,%rax,2),              %eax              # %eax := %edi + ii * %eax                                          ret

Stack

Recollect that the stack is a segment of memory used to store objects with automatic lifetime. Typical stack addresses on x86-64 look like 0x7ffd'9f10'4f58—that is, close to two⁴⁷.

The stack is named subsequently a data construction, which was sort of named later on pancakes. Stack data structures support at least three operations: button adds a new chemical element to the "peak" of the stack; pop removes the height chemical element, showing whatever was underneath; and summit accesses the top element. Note what's missing: the information structure does not allow access to elements other than the top. (Which is sort of how stacks of pancakes work.) This brake tin speed upward stack implementations.

Like a stack data structure, the stack retentivity segment is simply accessed from the pinnacle. The currently running function accesses its local variables; the function's caller, grand-caller, great-grand-caller, and and then forth are fallow until the currently running office returns.

x86-64 stacks look like this:

The x86-64 %rsp register is a special-purpose register that defines the current "stack pointer." This holds the address of the current top of the stack. On x86-64, as on many architectures, stacks grow downwards: a "push" operation adds infinite for more automatic-lifetime objects by moving the stack pointer left, to a numerically-smaller address, and a "pop" functioning recycles space by moving the stack pointer correct, to a numerically-larger address. This means that, considered numerically, the "top" of the stack has a smaller address than the "bottom."

This is built in to the architecture by the performance of instructions like pushq, popq, telephone call, and ret. A push button instruction pushes a value onto the stack. This both modifies the stack pointer (making information technology smaller) and modifies the stack segment (by moving data there). For case, the pedagogy pushq 10 means:

                          subq              $8,              %rsp              movq              10, (%rsp)

And popq 10 undoes the event of pushq X. It means:

                          movq              (%rsp),              Ten              addq              $viii,              %rsp

X can exist a register or a retentivity reference.

The portion of the stack reserved for a function is called that role's stack frame. Stack frames are aligned: x86-64 requires that each stack frame be a multiple of 16 bytes, and when a callq educational activity begins execution, the %rsp register must be sixteen-byte aligned. This means that every part's entry %rsp address will be viii bytes off a multiple of 16.

Return address and entry and go out sequence

The steps required to telephone call a office are sometimes chosen the entry sequence and the steps required to return are chosen the go out sequence. Both caller and callee accept responsibilities in each sequence.

To prepare for a function call, the caller performs the post-obit tasks in its entry sequence.

The caller stores the showtime six arguments in the corresponding registers.
If the callee takes more 6 arguments, or if some of its arguments are large, the caller must store the surplus arguments on its stack frame. It stores these in increasing society, and so that the 7th statement has a smaller address than the 8th statement, so forth. The 7th argument must be stored at (%rsp) (that is, the top of the stack) when the caller executes its callq instruction.
The caller saves any caller-saved registers (see beneath).
The caller executes callq Function. This has an effect like pushq $NEXT_INSTRUCTION; jmp Role (or, equivalently, subq $8, %rsp; movq $NEXT_INSTRUCTION, (%rsp); jmp Office), where NEXT_INSTRUCTION is the accost of the instruction immediately following callq.

This leaves a stack similar this:

To return from a function:

The callee places its return value in %rax.
The callee restores the stack arrow to its value at entry ("entry %rsp"), if necessary.
The callee executes the retq instruction. This has an effect like popq %rip, which removes the render address from the stack and jumps to that address.
The caller and so cleans up any infinite it prepared for arguments and restores caller-saved registers if necessary.

Particularly elementary callees don't demand to practise much more than return, but nearly callees will perform more tasks, such as allocating space for local variables and calling functions themselves.

Callee-saved registers and caller-saved registers

The calling convention gives callers and callees certain guarantees and responsibilities about the values of registers across function calls. Role implementations may wait these guarantees to hold, and must work to fulfill their responsibilities.

The most important responsibility is that certain registers' values must be preserved across function calls. A callee may use these registers, but if it changes them, information technology must restore them to their original values earlier returning. These registers are called callee-saved registers. All other registers are caller-saved.

Callers tin simply use callee-saved registers across role calls; in this sense they behave like C++ local variables. Caller-saved registers behave differently: if a caller wants to preserve the value of a caller-saved register across a part call, the caller must explicitly save it before the callq and restore it when the part resumes.

On x86-64 Linux, %rbp, %rbx, %r12, %r13, %r14, and %r15 are callee-saved, as (sort of) are %rsp and %rip. The other registers are caller-saved.

Base pointer (frame pointer)

The %rbp register is chosen the base of operations pointer (and sometimes the frame arrow). For simple functions, an optimizing compiler by and large treats this similar any other callee-saved general-purpose register. However, for more complex functions, %rbp is used in a specific blueprint that facilitates debugging. It works like this:

The first educational activity executed on function entry is pushq %rbp. This saves the caller's value for %rbp into the callee'due south stack. (Since %rbp is callee-saved, the callee must relieve it.)
The second pedagogy is movq %rsp, %rbp. This saves the current stack pointer in %rbp (and then %rbp = entry %rsp - eight).

This adjusted value of %rbp is the callee's "frame pointer." The callee will not alter this value until information technology returns. The frame pointer provides a stable reference point for local variables and caller arguments. (Complex functions may need a stable reference bespeak because they reserve varying amounts of space for calling unlike functions.)

Note, as well, that the value stored at (%rbp) is the caller's %rbp, and the value stored at 8(%rbp) is the render address. This information tin exist used to trace backwards through callers' stack frames past functions such as debuggers.
The function ends with movq %rbp, %rsp; popq %rbp; retq, or, equivalently, go out; retq. This sequence restores the caller'south %rbp and entry %rsp before returning.

Stack size and cerise zone

Functions execute fast because allocating space within a function is simply a matter of decrementing %rsp. This is much cheaper than a call to malloc or new! But making this piece of work takes a lot of mechanism. We'll run into this in more than detail later; but in brief: The operating system knows that %rsp points to the stack, so if a function accesses nonexistent memory about %rsp, the Bone assumes it'due south for the stack and transparently allocates new retentivity there.

So how can a program "run out of stack"? The operating system puts a limit on each role'due south stack, and if %rsp gets too low, the program partitioning faults.

The diagram in a higher place also shows a nice feature of the x86-64 architecture, namely the red zone. This is a pocket-size area above the stack pointer (that is, at lower addresses than %rsp) that can be used by the currently-running role for local variables. The red zone is nice because information technology can be used without mucking effectually with the stack arrow; for pocket-size functions push and popular instructions end up taking time.

Branches

The processor typically executes instructions in sequence, incrementing %rip each time. Deviations from sequential didactics execution, such as function calls, are called control flow transfers.

Role calls aren't the merely kind of control flow transfer. A branch instruction jumps to a new instruction without saving a return accost on the stack.

Branches come in two flavors, unconditional and conditional. The jmp or j teaching executes an unconditional branch (like a goto). All other branch instructions are conditional: they only branch if some status holds. That condition is represented past condition flags that are set as a side effect of every arithmetic performance.

Arithmetics instructions change part of the %rflags register as a side event of their operation. The most often used flags are:

ZF (zero flag): set iff the consequence was nix.
SF (sign flag): set iff the most meaning chip (the sign bit) of the issue was i (i.e., the result was negative if considered as a signed integer).
CF (carry flag): gear up iff the event overflowed when considered as unsigned (i.eastward., the result was greater than 2^W-1).
OF (overflow flag): gear up iff the issue overflowed when considered every bit signed (i.eastward., the effect was greater than 2^W-1-1 or less than –2^W-one).

Although some instructions permit you lot load specific flags into registers (e.yard., setz; see CS:APP3e §3.6.ii, p203), lawmaking more often accesses them via conditional jump or provisional move instructions.

Instruction	Mnemonic	C example	Flags
j (jmp)	Jump	`suspension;`	(Unconditional)
je (jz)	Jump if equal (zero)	`if (x == y)`	ZF
jne (jnz)	Spring if non equal (nonzero)	`if (x != y)`	!ZF
jg (jnle)	Jump if greater	`if (10 > y)`, signed	!ZF && !(SF ^ OF)
jge (jnl)	Jump if greater or equal	`if (ten >= y)`, signed	!(SF ^ OF)
jl (jnge)	Bound if less	`if (10 < y)`, signed	SF ^ OF
jle (jng)	Jump if less or equal	`if (x <= y)`, signed	(SF ^ OF) \|\| ZF
ja (jnbe)	Jump if above	`if (x > y)`, unsigned	!CF && !ZF
jae (jnb)	Jump if above or equal	`if (x >= y)`, unsigned	!CF
jb (jnae)	Jump if beneath	`if (x < y)`, unsigned	CF
jbe (jna)	Jump if below or equal	`if (x <= y)`, unsigned	CF \|\| ZF
js	Jump if sign bit	`if (ten < 0)`, signed	SF
jns	Jump if not sign bit	`if (10 >= 0)`, signed	!SF
jc	Leap if carry bit	N/A	CF
jnc	Spring if not carry chip	N/A	!CF
jo	Jump if overflow chip	N/A	OF
jno	Spring if not overflow bit	N/A	!OF

The test and cmp instructions are often seen before a conditional branch. These operations perform arithmetic merely throw away the result, except for condition codes. exam performs binary-and, cmp performs subtraction.

cmp is difficult to grasp: remember that subq %rax, %rbx performs %rbx := %rbx - %rax—the source/destination operand is on the left. So cmpq %rax, %rbx evaluates %rbx - %rax. The sequence cmpq %rax, %rbx; jg L will jump to label L if and but if %rbx is greater than %rax (signed).

The weird-looking teaching testq %rax, %rax, or more than mostly testq REG, SAMEREG, is used to load the condition flags appropriately for a unmarried register. For example, the bitwise-and of %rax and %rax is zero if and simply if %rax is zero, so testq %rax, %rax; je 50 jumps to 50 if and only if %rax is zero.

C++ compilers and data construction implementations have been designed to avoid the and then-chosen abstraction penalty, which is when convenient information structures compile to more than and more-expensive instructions than simple, raw memory accesses. When this works, it works quite well; for example, this:

                          long              f(std::vector<              int              >&              v) {              long              sum              =              0;              for              (auto              &              i              : v) {         sum              +=              i;     }              return              sum; }

compiles to this, a very tight loop like to the C version:

                          movq              (%rdi),              %rax              movq              8(%rdi),              %rcx              cmpq              %rcx,              %rax              je              .L4              movq              %rax,              %rdx              addq              $4,              %rax              subq              %rax,              %rcx              andq              $-4,              %rcx              addq              %rax,              %rcx              movl              $0,              %eax              .L3:              movslq              (%rdx),              %rsi              addq              %rsi,              %rax              addq              $4,              %rdx              cmpq              %rcx,              %rdx              jne              .L3              rep              ret              .L4:              movl              $0,              %eax              ret

Nosotros can also use this output to infer some aspects of std::vector'due south implementation. It looks like:

The showtime element of a std::vector construction is a arrow to the first element of the vector;
The elements are stored in memory in a uncomplicated assortment;
The second chemical element of a std::vector structure is a arrow to i-past-the-end of the elements of the vector (i.e., if the vector is empty, the first and second elements of the structure have the same value).