Review of Processor Architecture

The Intel 8085 Processor (1976)

Review of Processor Architecture

by: burt rosenberg
at: university of miami
update: 11 sept 2019
30 aug 2024

Overview
8085 Microarchitecture
The 8085 Instruction Set
Privileged Mode
Interrupts
Virtual Memory
Stack and Call ABI

Overview

On the bicentennial year of America's founding, Intel introduced the first fully functional computer on a chip, the Intel 8085. The current, powerful, Intel processors are direct descendants of this 8-bit microprocessor.

It is called an 8-bit microprocessor because its normal data paths were the 8 bit byte. However, the address lines were normally two bytes, 16-bits, so 65536 bytes (or 64 KiB) of memory was supported.

The 8085 was made possible by the increasing density of components etched and deposited into a single wafer of silicon in a photographically based process called VLSI, Very Large Scale Integration. Ever increasing numbers of transistors could be printed onto a single wafer. To create the 8085 6,500 transistors were required. The Intel i5, the CPU in the computer I was using when I started writing this page, has two billion transistors.

Intel 8085 Microarchitecture

8085 Microarchitecture

Because of its simplicity, the 8085 is a good place to start a review of processor architecture. The major deficiencies of the 8085 are,

it supports only direct or indirect addressing,
it does not support virtual memory,
it does not support privileged operating modes, and
has a simplified interrupt scheme.

However, otherwise it has all the most important elements still current in today's Intel chips, as show in the diagram to the right.

The General Purpose Registers (B, C, D, E, H, L): Six 8 bit registers, the source or destination of data. They can be used in pairs BC, DE and HL as 16 bit registers.
The addressing modes supported are,
- Immediate: loading into a register the data in the instruction
- Direct: loading and storing a register at the address given in the instruction
- Register: moving data from register to register
- Register Indirect: loading or storing a register at memory adress in register HL
The Stack Pointer (SP): Holds the 16 bit address of the lowest byte in a region of memory arranged as a stack. A push first decrements the SP by two bytes, then stores the contents of register pair R at that address.
```
    PUSH ↧: *(--SP) = R 
```
A pop increments the SP by two bytes, then loads the contents of that address into the register pair R.
```
    POP ↥:  R = *(SP++) 
```
The Program Counter (PC): The more modern name is the Instruction pointer (IP). Always contains the 16 bit address in memory of the next instruction to be fetched and executed.
The Flag Register: a register which each bit has a meaning. This communicates results between instructions, such as the zero flag, indicating the previous executed instruction yielded zero.
The Interrupt Control: The 8085 supported vectored interrupts, however in a manner very simple compared to interrupt handling in today's microarchitectures.
In response to an electrical signal on the interrupt control lines, in collaboration with support chips, the next instruction at the PC is ignored and the instruction RST(n) is pushed into the data stream. The RST(n) is a call to memory location 8*n, for n equal to 0 through 7. As it is a call, the PC is pushed onto the stack,
```
        *(--SP) = PC
        PC = 0  // (or 8, 16, ... , 56 depending on n)
```
Typically the target of the RST is a jump instruction (JMP) to code handling the interrupt at level n. To later resume the interrupted instruction flow, the PC is popped off the stack, and any interrupt mask bits are cleared.
```
       PC = *(SP++)
```

The 8085 Instruction Set

Here is a brief look at the 8085 Instruction Set, so these concepts are made exact.

Data Transfer group
- MOV r1, r2: Move register. Move contents to register to r2 to r1.
- MOV M, r: Move to register. Move contents of register r to memory address ((H)(L))
- MVI r1, data: Move immediate. The contents of the byte following moved to r1.
Stack group
- PUSH rp: Push register pair. The stack is decremented by two. The register pair is copied to the new stack locations.
- POP rp: Pop register pair. The stack is increment by two. The register pair receives the stack locations thus removed.
Arithmetic group
- ADD r: Add register. Contents of register r is added to the accumulator, and flags are set (such as the Z, the zero flag)
Branch group
- JMP addr: Jump address. Contents of the two following bytes are loaded into the PC.
- JZ: Jump conditional (on zero). Contents of the two following bytes are loaded into the PC if the Z flag is set.
- CALL addr: Call address: Contents of PC are pushed on to the stack, and the PC is loaded as in a JMP.
- RET: Return: Pop the stack into the PC.

Privileged Mode

The 8085 did not have a privileged mode of operation. User programs and the operating system had the same ability to interact with the hardware. The Intel i286 (same as 30286) launched in 1982 was the first of the Intel family to have a protected mode.

There are three resources to protect,

the manipulation of the privilege mode itself,
accessibility to regions of physical memory,
hardware devices such as the keyboard or disk.

Privileged mode blocks certain instructions from executing unless the processor is in privileged mode. It also works with the virtual memory system so that addresses as named in an instruction are not the address in the RAM. There were more reasons to take this step than protection, but it did mean that a process could only access mapped addresses.

To enter privilege mode, an instruction was provided that trapped into the kernel at a specific, defined address. The kernel takes it from there to determine why it ended up in privilege mode. If it were to take an action on behalf of a user program, such as access to a disk.

Traps look like calls. Interrupts are a form of trap that occur not because of an instruction, but because of a hardware event. These also enter privileged more at a pre-determined address in the kernel, to service the hardware event.

A special return instruction returned from the trap that closed back down the privilege, simultaneous with the return to the user program.

Interrupts

Interrupts and Exceptions

The interrupt mechanism was reused in the creation of the trap, in order to enter into privileged mode safely. A trap is a kind of exception that requests, as does an interrupt, that normal instruction execution be suspended, and kernel mode entered in order to handle something exceptional.

Interrupt comes in asynchronously from the world.
Exceptions are a consequence of instruction execution.

Because exceptions are caused by instruction execution, they fall into three classes based on how to recover from the exception,

The fault. The reason for the instruction faulting can be remedied. The kernel should fix the problem and return to retry the faulting instruction.
The trap. The instruction caused the trap as a software interrupt. The kernel should handle the request and return to the instruction following the trapping instruction.
The abort. Bad things happened. There is no reasonable way for the thread to continue executing.

While in privileged mode, an instruction is allowed that will drop the privilege. Entering privileged mode must be completely protected and under the entire control of the kernel.

Interrupts to service hardware must enter privileged mode, to carry out what is required by hardware. If it is an interrupt from the disk, it must be able to issues instructions to the disk to handle the interrupt. These service routines are the drivers, and are considered part of the kernel, so it is correct they share with the kernel the privilege mode. Therefore, among the processors responses to an interrupt, it will obtain privilege mode coincidentally with calling the service routine.

When a user program requires a privileged mode service, it mimics the interrupt by issuing a software-initiated interrupt, called a trap. As does the interrupt, the trap will force a call into the kernel, while at the same time obtaining privileged mode.

The response to a trap or an interrupt is completely under control of the kernel. Before enabling interrupts or starting a user mode process, the kernel places the addresses of interrupt and trap service routines in a table. This table is in memory inaccessible to any process but the kernel. The location of the table is installed into the interrupt handling mechanism using a privileged mode instruction. Once privilege mode is dropped, the response to an interrupt or trap cannot be modified. A user mode process is denied the hardware access necessary to manipulate the response.

Virtual Memory

As major conceptual and technical lead was made with the introduction of virtual memory. Virtual memory solved several problems,

It provided a mechanism to protect memory, for instance, contain processes in isolated memory spaces,
It allowed processes to see a simplified linear address space, without interference between processes.
It allowed the backing store to the virtual memory to be build out of various kinds of physical memory.

The implementation of Virtual Memory is by an Memory Management Unit (MMU) subcomponent of the CPU. The MMU was introduced in the Intel i386 (same as 80386) in 1985.

The MMU is involved in every memory fetch to map a virtual address to a physical address. This map is done as a table lookup in tables called page tables. The operating system maintains the page tables and the MMU uses them automatically.

The mapping is entirely under the control of the operating system, whose code is in privileged memory, and working up page tables that are in privileged memory, so no memory can be access with the operating system permitting it. There will be separate page table for each process, and this effectively boxes a process into a memory space isolated from the memory space of other processes.

☢ Software speaks only virtual addresses ☢

There are several systems for page tables, and I describe that which is similar to what Intel has created but simpler.

The MMU does not map byte-by-byte, but large ranges of bytes to large ranges of bytes. This cuts down the size of the page table. The i386 had already progressed to 32 bit addresses. Virtual memory for 16 bit processors never happened but if they had, this is how they might of worked.

The virtual memory that never was

The following is an explanation of a fictitious MMU the 8085 never had. It is inspired however by the Page Size Extension method of the Intel Pentium Pro (schematic above).

A special MMU register is loaded with a physical address called the page table base B. A 16 bit virtual address V is separated into the top 7 bits of page number P and the lower 9 bits of offset O, such that V = P * 2⁹ + O.

The MMU fetches the one byte page table entry E from the page table at physical address B+P. If the least significant bit of E is zero, the entry is not valid, and the MMU interrupts the processor. Else, the 7 top bits of E are copied to the 7 top bits of V, producing the mapped physical address.

When changing between processes or entering or leaving the kernel, reloading the page table base in the MMU installs the appropriate page table.

The virtual memory that never was ..

The 16 bit address is divided into the top 7 bits and the bottom 9 bits.

As a virtual address, the top 7 bits are the page number.
As a physical address, the top 7 bits are the page frame.
The bottom 9 bits will be common to both worlds, and is the offset into the page

Because the offset into the page is 9 bytes, the page size is 512 bytes.

Because the page number is 7 bits there are 128 pages. Also the 64k physical memory is divided into 128 page frames. A page table fits into a page but having a 4 byte mapping for each of the 128 pages, working out exactly to 512 bytes for the table.

A MMU register is loaded with the half frame number of the page table. This would be an 8 bit register. The page number is multiplied by 2 (a left shift) and or'ed into the MMU register to provide the bottom 8 bits of a 16 bit physical address.

That 16 bit data word is fetched at this address and 7 of the 16 bits are extracted as the page frame where resides in physical memory for this virtual page.

The offset is or'ed into the frame as the lower 9 bits to form the physical address of the byte to fetch.

In fact, additions are simplified by having everything aligned at powers of two. Therefore rather than addition, all one does is replace zero's by the number.

Operating Systems use of Virtual Memory

Linux 32-bit virtual memory

The swap of page tables is quickly done by loading the base (physical) address of the appropriate page table into an MMU register. However, the 32-bit Linux operating system has an additional mechanism when entering and leaving the kernel's protected address space. The 4G virtual memory range of 32-bit operating systems is partitioned into the lower 3G for user, and the upper 1G for the kernel.

Intel provides virtual memory space by segments that describe a range and a permission on the range. When in kernel mode the segment spans the entire 4G range. Moving between kernel and user, the segment is restricted to just the lower 3G. While different page tables map the lower 3G differently, they all map the upper 1G the same — that virtual memory region is that of the kernel.

Therefore, when running in the kernel the operating system can access the current user memory space as well as the kernel space in a single range. The kernel can swap between user spaces without affecting its access to the kernel space. It does however limit the user to only 3G of virtual memory.

The transition to or from privileged mode is accompanied by additional swaps of context. To provide complete protection of the kernel from the user process, interrupts and traps do not push onto the user's stack, nor use the user's page tables. Entry to privilege mode swaps in the kernel's stack and kernel's page tables just as exit from privilege mode restores the user's stack and page tables.

In the case of Intel, there is one interrupt stack per core that is activated when handling an interrupt. Each thread has its own kernel stack that is activated on exception by that thread.

Context switch:

The kernel resident stack, whether the interrupt or thread stack, can be completed to contain all registers of the CPU. In the case of the 8085, this means that the handler of the interrupt or trap will push the SP, BC, DE, HL and flags register. Note that the SP register must be interpreted using the page table of the interrupted or trapped process.

Also, the referents to the page tables, that which is installed into the MMU to activate the virtual-physical address mapping for this process, are pushed.

The contents of the stack can then be copied off in to a Thread Context Block (TCB) that is stored among other such TCB's in a linked list inside the memory space of the kernel. If a different TCP is selected to copy the saved registers into the interrupt or thread stack, the physical thread of the core will return into the process context and specific code execution of that thread — picking off where that computation left off.

This is called a context switch, and pre-emptive multiprocessing can be achieved by attaching such a context switch to the handling of a clock interrupt, say every tenth of a second, for a scheduling time quantum of one tenth of a second.

Stack and Call ABI

        |        |
        +--------+
        |  b hi  |
        +--------+  assume sizeof(int)==2
        |  b lo  |
        +--------+
        |  a hi  |
        +--------+  assume sizeof(int)==2
        |  a lo  |
        +--------+  -+
        |        |   |
        |        |   |
        +-   s  -+    > sizeof(struct S) 
        |        |   |
        |        |   |
        +--------+  -+
        |  PC hi |
        +--------+  return address
 SP --> |  PC lo |
        +--------+
        |        |

The stack after the call to f().
Note: a hypothetical API

Stacks are used extensively for subroutine calls. Let us see how the 8085 would handle a call to this function:


    struct S {
        int len ;
        int r[LEN] ;
    } ;

    struct S f(int a, int b) ;

    int main(int argc, char * argv[]) {
        struct S s ;
        int a, b ;
        s = f( a, b ) ;
        return 0 ;
    }

The convention by which subroutines pass arguments is called the Application Binary Interface (ABI). Aspects of the ABI are arbitrary, but all code must agree on the ABI to interoperate

The ABI described here will use the stack to pass all arguments and the return value. The arguments are pushed rightmost argument first, then the stack pointer is descended by the storage size of the return value, in this example, the number of bytes of struct S.

This ABI requires that the caller clean up the stack. When f returns, the old PC value is at the top of the stack, and is popped off into the PC register. The return value is conveniently on the top of stack, and is copied into s. The stack is then moved up by the total number of bytes of the parameters and return value,

    SP += sizeof(struct S) + 2 * sizeof(int)

The compiler knows the default ABI for its platform, and compiles code consistent with the ABI's requirements. What I have shown is hypothetical. In the cdecl API, the caller would dynamically allocate a temporary struct on the heap, and pass a pointer to the struct as a hidden first parameter.

It is also common that the return value, if it fits into a CPU register, is returned by a register.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Author: Burton Rosenberg
Created: 10 September 2019
Last Update: 30 September 2024