Stack Layouts:
with a look forward to virtual addressing


by: burt rosenberg
at: university of miami
created: sept 2015
Revised:

User memory is laid out in virtual address space from address 0 up to some maximum value that depends on a few things. To make things simple, a 32-bit machine running a traditional 32-bit linux kernel will have maximum user space address of 3-Gigabyte.

The space is laid out typically with the program in the lowest addresses, followed by pre-initialized data, followed by a variable sized heap segment that grows up, then starting at the top of the space, a stack segment that grows down.

+----------------------------------------------------------------+  +--------------+
|                  |       |                                     |  |              |
|   text segment   |  BSS  |   heap ---->           <----- stack |  | Linux Kernel |
|                  |       |                                     |  |              |
+----------------------------------------------------------------+  +--------------+
0                                                                3G                4G

Most of the memory used during a program run is on the stack. The heap is for malloc's. The BSS would just be for a few variables like globals, and constant strings. The parameters to function calls (including argc and argv), and the local variables in scope of a function are all on the stack.

The stack is arranged in frames, with each frame associated with the instance of a function call. In IA-32 (x-86) architectures on linux, and in windows as well, a function call such as f(a,b,c) will push onto the stack the values of a, b and c, and then in calling f push onto the stack the address of the the instruction following f. This is called the return address, and is where code flow resumes when the called function executes the return.

The called function creates a new stack frame for the running of the function. A stack frame is identified by a memory address in a register called the base-pointer, EBP (this is specific to IA-32 architecture, but other CPU's might have a similar register). The EBP contains the base of the current stack frame, that is, the highest numbered address in the stack frame. All references to local variables in a function are not made by the name of the variable, but by a numerical offset to the EBP. The compiler will layout the stack frame and replace a local variable name, say i, with the offset -4, if it so happens that the variable i is stored at a location 4 bytes below the top of the stack frame.

To create a new stack frame, the called program pushes the old EBP onto the stack and then loads the EBP with the current stack point, ESP. It then drops the stack pointer by the number of bytes needed in the stack frame, which is determined by the compiler, and how the compiler has laid out and allocated the local variables.

The storing of the old EBP at the top of the current stack frame creates a linked list of stack frames, with the first element of the stack frame being the pointer to the previous stack frame. This allows a return to easily pop the stack frame, simply by reloading EBP with what the EBP points to.

                 +--------->-------+  +----->-----+
                 |                  \|             \
...-------------+-+-+---------------+-+-+----------+-+-----------------------+  
             ...|E|r|call        ...|E|r|          |/|                       |  
           local|B|e|params    local|B|e|          |/|  <-- stack grows down |  
           vars |P|t|...       vars |P|t|          |/|                       |  
...-------------+-+-+---------------+-+-+----------+-+-----------------------+  
     ^           ^ \_______  ________/ \_____  _____/
     |           |         \/                \/
  current SP     |    stack frame        stack frame
                 |
             curent EBP

Example

Here is a listing of a stack. Below is the program which created this stack. The stack frames are noted, along with the chain of stack frame pointers, the return addresses, the pushed paramaters n, a, b, and c, and the local variables i, j and k.

addresses of subroutines: main= 0x8048506, f=0x80484ad, g=0x804844d 0xbff38538 0x00000000 end of chain of stack frames 0xbff38534 0x00000000 0xbff38530 0x08048560 0xbff3852c 0x00000007 0xbff38528 0x00000006 0xbff38524 0x00000005 0xbff38520 0x00000004 0xbff3851c 0x08048557 return address in main 0xbff38518 0xbff38538 next stack frame 0xbff38514 0x40021938 0xbff38510 0x00000001 0xbff3850c 0x0000000c int k 0xbff38508 0x0000000b int j 0xbff38504 0x0000000a int i 0xbff38500 0x401d7ac0 0xbff384fc 0x00000007 parameter c 0xbff384f8 0x00000006 parameter b 0xbff384f4 0x00000005 parameter a 0xbff384f0 0x00000003 parameter n 0xbff384ec 0x080484f1 return address in f 0xbff384e8 0xbff38518 next stack frame 0xbff384e4 0x400316c8 0xbff384e0 0xffffffff 0xbff384dc 0x0000000c int k 0xbff384d8 0x0000000b int j 0xbff384d4 0x0000000a int i 0xbff384d0 0x0804822c 0xbff384cc 0x00000007 parameter c 0xbff384c8 0x00000006 parameter b 0xbff384c4 0x00000005 parameter a 0xbff384c0 0x00000002 parameter n 0xbff384bc 0x080484f1 return address in f 0xbff384b8 0xbff384e8 next stack frame 0xbff384b4 0x00000001 0xbff384b0 0x00000001 0xbff384ac 0x0000000c int k 0xbff384a8 0x0000000b int j 0xbff384a4 0x0000000a int i 0xbff384a0 0xbff38500 0xbff3849c 0x00000007 parameter c 0xbff38498 0x00000006 parameter b 0xbff38494 0x00000005 parameter a 0xbff38490 0x00000001 parameter n 0xbff3848c 0x080484f1 return address in f 0xbff38488 0xbff384b8 next stack frame 0xbff38484 0x40021a94 0xbff38480 0xbff384e8 0xbff3847c 0x0000000c int k 0xbff38478 0x0000000b int j 0xbff38474 0x0000000a int i 0xbff38470 0xbff38530 0xbff3846c 0x00000007 parameter c 0xbff38468 0x00000006 parameter b 0xbff38464 0x00000005 parameter a 0xbff38460 0x00000000 parameter n 0xbff3845c 0x080484f1 return address in f 0xbff38458 0xbff38488 next stack frame 0xbff38454 0x00000001 0xbff38450 0xffffffff 0xbff3844c 0x0000000c int k 0xbff38448 0x0000000b int j 0xbff38444 0x0000000a int i 0xbff38440 0x40021938 0xbff3843c 0x0804824b 0xbff38438 0xbff38450 0xbff38434 0xbff38458 0xbff38430 0x00000064 parameter d 0xbff3842c 0x080484ff return address in f (following call to g) 0xbff38428 0xbff38458 next stack frame 0xbff38424 0x40021af0 0xbff38420 0x4002155c 0xbff3841c 0xbff3841c int * ip 0xbff38418 0x00000000 int i done
#include<stdio.h> #include<stdlib.h> /* * stack frame demonstration program * author: bjr * created: 11 Sept 2014 * lastupdate: * */ int g(int d) { int i = d ; int * ip = &i ; ip += d ; for ( ; i>=0; i-- ) { printf("%p 0x%08x\n", ip, *ip) ; ip -= 1 ; } printf("done\n") ; return 0 ; } int f(int n ,int a, int b, int c) { int i = 0x0a ; int j = 0x0b ; int k = 0x0c ; if ( n ) f(n-1, a, b, c ); else g(100) ; return 0 ; } int main(int argc, char * argv[]) { printf("addresses of subroutines:\n main= %p, f=%p, g=%p\n\n", main, f, g ) ; f(4,5,6,7) ; return 0 ; } // here is it in a binary compile: cc -S stack-demo.c, the .s file produced. .file "stack-demo.c" .section .rodata .LC0: .string "%p 0x%08x\n" .LC1: .string "done" .text .globl g .type g, @function g: .LFB2: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 subl $24, %esp movl %gs:20, %eax movl %eax, -12(%ebp) xorl %eax, %eax movl 8(%ebp), %eax movl %eax, -20(%ebp) leal -20(%ebp), %eax movl %eax, -16(%ebp) movl 8(%ebp), %eax sall $2, %eax addl %eax, -16(%ebp) jmp .L2 .L3: movl -16(%ebp), %eax movl (%eax), %eax subl $4, %esp pushl %eax pushl -16(%ebp) pushl $.LC0 call printf addl $16, %esp subl $4, -16(%ebp) movl -20(%ebp), %eax subl $1, %eax movl %eax, -20(%ebp) .L2: movl -20(%ebp), %eax testl %eax, %eax jns .L3 subl $12, %esp pushl $.LC1 call puts addl $16, %esp movl $0, %eax movl -12(%ebp), %edx xorl %gs:20, %edx je .L5 call __stack_chk_fail .L5: leave .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc .LFE2: .size g, .-g .globl f .type f, @function f: .LFB3: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 subl $24, %esp movl $10, -20(%ebp) movl $11, -16(%ebp) movl $12, -12(%ebp) cmpl $0, 8(%ebp) je .L7 movl 8(%ebp), %eax subl $1, %eax pushl 20(%ebp) pushl 16(%ebp) pushl 12(%ebp) pushl %eax call f addl $16, %esp jmp .L8 .L7: subl $12, %esp pushl $100 call g addl $16, %esp .L8: movl $0, %eax leave .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc .LFE3: .size f, .-f .section .rodata .align 4 .LC2: .string "addresses of subroutines:\n main= %p, f=%p, g=%p\n\n" .text .globl main .type main, @function main: .LFB4: .cfi_startproc leal 4(%esp), %ecx .cfi_def_cfa 1, 0 andl $-16, %esp pushl -4(%ecx) pushl %ebp .cfi_escape 0x10,0x5,0x2,0x75,0 movl %esp, %ebp pushl %ecx .cfi_escape 0xf,0x3,0x75,0x7c,0x6 subl $4, %esp pushl $g pushl $f pushl $main pushl $.LC2 call printf addl $16, %esp pushl $7 pushl $6 pushl $5 pushl $4 call f addl $16, %esp movl $0, %eax movl -4(%ebp), %ecx .cfi_def_cfa 1, 0 leave .cfi_restore 5 leal -4(%ecx), %esp .cfi_def_cfa 4, 4 ret .cfi_endproc .LFE4: .size main, .-main .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits

A look forward to virtual memory

The memory layout described about might worry the reader. It seems that since all 4G of the memory is devoted to a single user program, how does the computer support multiple user's simultaneous use of memory?

One solution is called swapping. Swapping isn't used in this form anymore, but let me explain it this way, since it will work it will just be slow. To change from user program to user program, the entire 3G of memory is copied out to disk. That is called a swap out. Then a previously swapped out image is located on the disk and it is copied into memory, and the program picks up where it left off. This is called a swap in.

Another solution, what was used years ago, is to partition the memory, so that each running program was only allowed use a certain range of addresses. This was very inconvenient because the program needed to be recompiled so that all the instruction addresses and data addresses fit in a range of addresses that wouldn't be know until just before the program ran. Besides, 3G is not a lot, and to start sharing it between all running programs is a problem in itself.

These days, the problem is solved by virtual memory. The addresses that the CPU sees are virtual, they do not exactly correspond to the physical address where the data will be stored in physical memory. The operating system sets up a correspondance between the virtual addresses and available physical memory. The memory fetch and store logic is constantly translating between virtual and physical addresses, according to tables maintained by the operating system.

For more information see Memory Management on Wiki.