Architecture Overview

<< Click to Display Table of Contents >>

Navigation:  ASA-EMulatR Reference Guide > Introduction >

Architecture Overview

1. Purpose of This Chapter

 

This chapter provides a top-level architectural description of the EMulatR system.

Its purpose is to explain what components exist, how they interact, and how execution flows, without delving into instruction-level implementation details.

 

This chapter is normative.

Subsystems and future refactors must preserve the invariants described here.

 


 

2. System Overview

 

EMulatR is a cycle-based, SMP-capable Alpha AXP processor emulator designed to prioritize:

 

Architectural correctness

Deterministic and debuggable execution

Faithful modeling of Alpha’s weak memory ordering

Clean separation of concerns between execution, memory, PAL, and devices

 

Each Alpha CPU instance executes independently but coordinates with others through shared memory, inter-processor interrupts, and coherency mechanisms.

 


 

3. Execution Model

 

3.1 Cycle-Based Execution

 

Execution proceeds in a cycle-based run loop:

 

One iteration of the run loop corresponds to one hardware clock cycle

Each CPU owns its own run loop

Forward progress occurs unless explicitly stalled by architectural conditions

 

The run loop coordinates:

 

Pipeline advancement

Exception and interrupt delivery

Barrier and serialization enforcement

SMP coordination

 


 

3.2 High-Level Execution Flow

 

At a conceptual level, each cycle performs:

 

1. Pre-cycle checks (halt, interrupts, exceptions)

2. Pipeline advancement (writeback → fetch)

3. Barrier release evaluation

4. Branch resolution

5. Exception and interrupt delivery

6. State updates and housekeeping

 

Execution continues until a halt condition occurs.

 


 

4. Pipeline Architecture

 

4.1 Six-Stage Pipeline

 

EMulatR models a six-stage Alpha pipeline:

 

| Stage | Name      | Responsibility                          |

| ----- | --------- | --------------------------------------- |

| 0     | Fetch     | Instruction fetch and frontend control  |

| 1     | Decode    | Grain association and validation        |

| 2     | Issue     | Hazard detection and operand readiness  |

| 3     | Execute   | All architectural work happens here |

| 4     | Memory    | Advancement only (work already done)    |

| 5     | Writeback | Architectural commit point          |

 

Invariant:

Writeback (WB) is the only stage where architectural state is committed.

 


 

4.2 PipelineSlot Contract

 

Each pipeline stage holds a `PipelineSlot`, which encapsulates:

 

Decoded instruction (“grain”)

Operand and result storage

Fault and exception state

Stall and serialization flags

 

Slots may be:

 

Valid

Stalled

Flushed (invalidated)

 

Slots after a fault, branch misprediction, or PAL entry may be discarded without committing state.

 


 

4.3 Backward Advancement

 

The pipeline advances backward (WB → MEM → EX → Issue → Decode → Fetch) each cycle.

 

This ordering:

 

Prevents write-after-read hazards

Ensures precise exceptions

Allows speculative slots to be safely invalidated

 


 

5. Functional Execution Domains (“Boxes”)

 

Execution is divided into functional domains, referred to as boxes.

Boxes are execution domains not pipeline stages.

 


 

5.1 IBox – Instruction Frontend

 

The IBox is responsible for:

 

Instruction fetch

Grain resolution and decode caching

Branch target calculation

Frontend stall enforcement

 

Decoding occurs once per instruction pattern; cached grains may be reused many times.

 


 

5.2 EBox – Integer and Address Execution

 

The EBox performs:

 

Integer arithmetic and logic

Address calculation

Conditional evaluation

Dispatch to memory and coherency services

 

 


 

5.3 FBox – Floating-Point Execution

 

The FBox executes floating-point operations and maintains FP state.

 

Execution policy:

 

Floating-point operations execute synchronously in the EX stage

The pipeline stalls in EX until the FP operation completes

No asynchronous FP completion tracking is used

 

This design favors simplicity, correctness, and debuggability.

 


 

5.4 MBox – Memory Operations

 

The MBox coordinates:

 

Load and store execution

DTB translation

Cache access

Interaction with memory barriers

 

All memory accesses occur during the EX stage via the MBox.

 


 

5.5 CBox – Coherency and Serialization

 

The CBox manages:

 

Memory barriers (MB, WMB, EXCB, TRAPB)

Write buffer draining

LL/SC reservation coordination

SMP coherency hooks

 

Barriers are enforced by stalling the pipeline frontend until release conditions are met.

 


 

5.6 PalBox – Privileged Execution

 

The PalBox executes PAL functionality:

 

CALL_PAL dispatch

Privileged register access (MFPR, MTPR)

Processor mode transitions

Interrupt and exception handoff

 

PAL executes inside the pipeline and is treated as a serialization boundary.

 


 

6. Memory System Architecture

 

6.1 GuestMemory and SafeMemory

 

GuestMemory is the shared physical address space visible to all CPUs

SafeMemory provides the backing store and safety mechanisms

MMIO regions are routed through the memory system but obey stronger ordering rules

 

GuestMemory is the sole authority for:

 

Physical memory visibility

LL/SC reservation invalidation

SMP coherency effects

 


 

 

6.2 Memory Ordering Model

EMulatR models Alpha’s weakly ordered memory architecture with explicit serialization:

 

Loads complete synchronously in EX

Stores are buffered and may drain asynchronously

No implicit ordering is guaranteed

 

Memory barriers (MB, WMB, EXCB, TRAPB):

 

Stall the pipeline frontend

Prevent speculation past the barrier

Release only after required drain conditions are met

 

MMIO accesses are strongly ordered and never buffered.

 


 

7. Serialization and Stall Model

 

7.1 Stall Sources

 

Pipeline stalls may occur due to:

 

Memory barriers

Fault dispatch

TLB misses

Floating-point EX occupancy

PAL entry/exit

MMIO wait states (future)

Branch misprediction recovery

 


 

7.2 Speculation Policy

 

Speculative execution may proceed past loads

Speculation must not proceed past barriers

Barriers block frontend fetch until released

 

This policy preserves Alpha ordering semantics while allowing forward progress.

 


 

8. Exceptions, Faults, and Interrupts

 

8.1 FaultDispatcher

 

All exceptional conditions are queued and prioritized by the FaultDispatcher.

 

Properties:

 

Precise exception delivery

Strict priority ordering

Single architectural delivery point

 


 

8.2 Pipeline Interaction

 

When an exception is delivered:

 

All younger pipeline slots are flushed

Architectural state is preserved up to WB

LL/SC reservations are cleared

Execution vectors to PAL or kernel handlers

 


 

8.3 Interrupts and IPIs

 

Interrupts are:

 

Sampled during pre-cycle checks

Delivered only when enabled and safe

Integrated with IPIs via IRQ-class events

 


 

9. PAL and Privilege Boundary

 

PAL execution is fully integrated into the pipeline.

 

Properties:

 

CALL_PAL acts as a serialization point

No speculative execution past PAL entry

PAL state is stored in `IPRStorage_Hot`

HW_REI restores prior execution context

 

This preserves precise exception and interrupt semantics.

 


 

10. SMP Architecture

 

10.1 CPU Identity

 

Maximum CPUs: 64

CPU IDs are typed (`CPUIdType`, `quint16`)

CPUs are created and owned by the SMP manager

 


 

10.2 Shared vs Private State

 

Each CPU owns:

 

Its pipeline

Register files

TLBs and caches

 

All CPUs share:

 

GuestMemory

MMIO devices

Coherency and IPI infrastructure

 


 

10.3 Inter-Processor Coordination

 

SMP coordination includes:

 

IPIs delivered via IRQ mechanisms

TLB shootdowns

Barrier synchronization

 

GuestMemory remains the coherence authority.

 


 

11. Devices and MMIO

 

MMIO design follows real hardware principles:

 

MMIO register accesses are synchronous and strongly ordered

Device operations execute asynchronously

Completion is signaled via interrupts or polling

 

This separation ensures correct device protocol behavior.

 


 

12. Architectural Invariants

 

The following invariants must always hold:

 

1. Writeback is the sole architectural commit point

2. No speculation past barriers or PAL entry

3. One LL/SC reservation per CPU

4. GuestMemory is the sole coherence authority

5. Exceptions are precise and ordered

6. PAL transitions serialize execution

 

Violating these invariants is a correctness bug.

 


 

13. Summary

 

EMulatR implements a clean, deterministic, and architecturally faithful Alpha AXP execution model.

Its pipeline is intentionally lightweight, execution-centric, and barrier-aware, enabling SMP correctness without unnecessary complexity.

 

This architecture provides a stable foundation for:

 

Instruction expansion

Device growth

Performance optimization

Long-term maintainability