ASA-EMulatR Reference Guide > Introduction

1. Purpose of This Chapter

This chapter provides a top-level architectural description of the EMulatR system.

Its purpose is to explain what components exist, how they interact, and how execution flows, without delving into instruction-level implementation details.

This chapter is normative.

Subsystems and future refactors must preserve the invariants described here.

2. System Overview

EMulatR is a cycle-based, SMP-capable Alpha AXP processor emulator designed to prioritize:

• Architectural correctness

• Deterministic and debuggable execution

• Faithful modeling of Alpha’s weak memory ordering

• Clean separation of concerns between execution, memory, PAL, and devices

Each Alpha CPU instance executes independently but coordinates with others through shared memory, inter-processor interrupts, and coherency mechanisms.

3. Execution Model

3.1 Cycle-Based Execution

Execution proceeds in a cycle-based run loop:

• One iteration of the run loop corresponds to one hardware clock cycle

• Each CPU owns its own run loop

• Forward progress occurs unless explicitly stalled by architectural conditions

The run loop coordinates:

• Pipeline advancement

• Exception and interrupt delivery

• Barrier and serialization enforcement

• SMP coordination

3.2 High-Level Execution Flow

At a conceptual level, each cycle performs:

1. Pre-cycle checks (halt, interrupts, exceptions)

2. Pipeline advancement (writeback → fetch)

3. Barrier release evaluation

4. Branch resolution

5. Exception and interrupt delivery

6. State updates and housekeeping

Execution continues until a halt condition occurs.

4. Pipeline Architecture

4.1 Six-Stage Pipeline

EMulatR models a six-stage Alpha pipeline:

| Stage | Name | Responsibility |

| ----- | --------- | --------------------------------------- |

| 0 | Fetch | Instruction fetch and frontend control |

| 1 | Decode | Grain association and validation |

| 2 | Issue | Hazard detection and operand readiness |

| 3 | Execute | All architectural work happens here |

| 4 | Memory | Advancement only (work already done) |

| 5 | Writeback | Architectural commit point |

Invariant:

Writeback (WB) is the only stage where architectural state is committed.

4.2 PipelineSlot Contract

Each pipeline stage holds a `PipelineSlot`, which encapsulates:

• Decoded instruction (“grain”)

• Operand and result storage

• Fault and exception state

• Stall and serialization flags

Slots may be:

• Valid

• Stalled

• Flushed (invalidated)

Slots after a fault, branch misprediction, or PAL entry may be discarded without committing state.

4.3 Backward Advancement

The pipeline advances backward (WB → MEM → EX → Issue → Decode → Fetch) each cycle.

This ordering:

• Prevents write-after-read hazards

• Ensures precise exceptions

• Allows speculative slots to be safely invalidated

5. Functional Execution Domains (“Boxes”)

Execution is divided into functional domains, referred to as boxes.

Boxes are execution domains not pipeline stages.

5.1 IBox – Instruction Frontend

The IBox is responsible for:

• Instruction fetch

• Grain resolution and decode caching

• Branch target calculation

• Frontend stall enforcement

Decoding occurs once per instruction pattern; cached grains may be reused many times.

5.2 EBox – Integer and Address Execution

The EBox performs:

• Integer arithmetic and logic

• Address calculation

• Conditional evaluation

• Dispatch to memory and coherency services

5.3 FBox – Floating-Point Execution

The FBox executes floating-point operations and maintains FP state.

Execution policy:

• Floating-point operations execute synchronously in the EX stage

• The pipeline stalls in EX until the FP operation completes

• No asynchronous FP completion tracking is used

This design favors simplicity, correctness, and debuggability.

5.4 MBox – Memory Operations

The MBox coordinates:

• Load and store execution

• DTB translation

• Cache access

• Interaction with memory barriers

All memory accesses occur during the EX stage via the MBox.

5.5 CBox – Coherency and Serialization

The CBox manages:

• Memory barriers (MB, WMB, EXCB, TRAPB)

• Write buffer draining

• LL/SC reservation coordination

• SMP coherency hooks

Barriers are enforced by stalling the pipeline frontend until release conditions are met.

5.6 PalBox – Privileged Execution

The PalBox executes PAL functionality:

• CALL_PAL dispatch

• Privileged register access (MFPR, MTPR)

• Processor mode transitions

• Interrupt and exception handoff

PAL executes inside the pipeline and is treated as a serialization boundary.

6. Memory System Architecture

6.1 GuestMemory and SafeMemory

• GuestMemory is the shared physical address space visible to all CPUs

• SafeMemory provides the backing store and safety mechanisms

• MMIO regions are routed through the memory system but obey stronger ordering rules

GuestMemory is the sole authority for:

• Physical memory visibility

• LL/SC reservation invalidation

• SMP coherency effects

6.2 Memory Ordering Model

EMulatR models Alpha’s weakly ordered memory architecture with explicit serialization:

• Loads complete synchronously in EX

• Stores are buffered and may drain asynchronously

• No implicit ordering is guaranteed

Memory barriers (MB, WMB, EXCB, TRAPB):

• Stall the pipeline frontend

• Prevent speculation past the barrier

• Release only after required drain conditions are met

MMIO accesses are strongly ordered and never buffered.

7. Serialization and Stall Model

7.1 Stall Sources

Pipeline stalls may occur due to:

• Memory barriers

• Fault dispatch

• TLB misses

• Floating-point EX occupancy

• PAL entry/exit

• MMIO wait states (future)

• Branch misprediction recovery

7.2 Speculation Policy

• Speculative execution may proceed past loads

• Speculation must not proceed past barriers

• Barriers block frontend fetch until released

This policy preserves Alpha ordering semantics while allowing forward progress.

8. Exceptions, Faults, and Interrupts

8.1 FaultDispatcher

All exceptional conditions are queued and prioritized by the FaultDispatcher.

Properties:

• Precise exception delivery

• Strict priority ordering

• Single architectural delivery point

8.2 Pipeline Interaction

When an exception is delivered:

• All younger pipeline slots are flushed

• Architectural state is preserved up to WB

• LL/SC reservations are cleared

• Execution vectors to PAL or kernel handlers

8.3 Interrupts and IPIs

Interrupts are:

• Sampled during pre-cycle checks

• Delivered only when enabled and safe

• Integrated with IPIs via IRQ-class events

9. PAL and Privilege Boundary

PAL execution is fully integrated into the pipeline.

Properties:

• CALL_PAL acts as a serialization point

• No speculative execution past PAL entry

• PAL state is stored in `IPRStorage_Hot`

• HW_REI restores prior execution context

This preserves precise exception and interrupt semantics.

10. SMP Architecture

10.1 CPU Identity

• Maximum CPUs: 64

• CPU IDs are typed (`CPUIdType`, `quint16`)

• CPUs are created and owned by the SMP manager

10.2 Shared vs Private State

Each CPU owns:

• Its pipeline

• Register files

• TLBs and caches

All CPUs share:

• GuestMemory

• MMIO devices

• Coherency and IPI infrastructure

10.3 Inter-Processor Coordination

SMP coordination includes:

• IPIs delivered via IRQ mechanisms

• TLB shootdowns

• Barrier synchronization

GuestMemory remains the coherence authority.

11. Devices and MMIO

MMIO design follows real hardware principles:

• MMIO register accesses are synchronous and strongly ordered

• Device operations execute asynchronously

• Completion is signaled via interrupts or polling

This separation ensures correct device protocol behavior.

12. Architectural Invariants

The following invariants must always hold:

1. Writeback is the sole architectural commit point

2. No speculation past barriers or PAL entry

3. One LL/SC reservation per CPU

4. GuestMemory is the sole coherence authority

5. Exceptions are precise and ordered

6. PAL transitions serialize execution

Violating these invariants is a correctness bug.

13. Summary

EMulatR implements a clean, deterministic, and architecturally faithful Alpha AXP execution model.

Its pipeline is intentionally lightweight, execution-centric, and barrier-aware, enabling SMP correctness without unnecessary complexity.

This architecture provides a stable foundation for:

• Instruction expansion

• Device growth

• Performance optimization

• Long-term maintainability

Architecture Overview