|
<< Click to Display Table of Contents >> Navigation: ASA-EMulatR Reference Guide > Introduction > Architecture Overview |
This chapter provides a top-level architectural description of the EMulatR system.
Its purpose is to explain what components exist, how they interact, and how execution flows, without delving into instruction-level implementation details.
This chapter is normative.
Subsystems and future refactors must preserve the invariants described here.
EMulatR is a cycle-based, SMP-capable Alpha AXP processor emulator designed to prioritize:
• Architectural correctness
• Deterministic and debuggable execution
• Faithful modeling of Alpha’s weak memory ordering
• Clean separation of concerns between execution, memory, PAL, and devices
Each Alpha CPU instance executes independently but coordinates with others through shared memory, inter-processor interrupts, and coherency mechanisms.
Execution proceeds in a cycle-based run loop:
• One iteration of the run loop corresponds to one hardware clock cycle
• Each CPU owns its own run loop
• Forward progress occurs unless explicitly stalled by architectural conditions
The run loop coordinates:
• Pipeline advancement
• Exception and interrupt delivery
• Barrier and serialization enforcement
• SMP coordination
At a conceptual level, each cycle performs:
1. Pre-cycle checks (halt, interrupts, exceptions)
2. Pipeline advancement (writeback → fetch)
3. Barrier release evaluation
4. Branch resolution
5. Exception and interrupt delivery
6. State updates and housekeeping
Execution continues until a halt condition occurs.
EMulatR models a six-stage Alpha pipeline:
| Stage | Name | Responsibility |
| ----- | --------- | --------------------------------------- |
| 0 | Fetch | Instruction fetch and frontend control |
| 1 | Decode | Grain association and validation |
| 2 | Issue | Hazard detection and operand readiness |
| 3 | Execute | All architectural work happens here |
| 4 | Memory | Advancement only (work already done) |
| 5 | Writeback | Architectural commit point |
Invariant:
Writeback (WB) is the only stage where architectural state is committed.
Each pipeline stage holds a `PipelineSlot`, which encapsulates:
• Decoded instruction (“grain”)
• Operand and result storage
• Fault and exception state
• Stall and serialization flags
Slots may be:
• Valid
• Stalled
• Flushed (invalidated)
Slots after a fault, branch misprediction, or PAL entry may be discarded without committing state.
The pipeline advances backward (WB → MEM → EX → Issue → Decode → Fetch) each cycle.
This ordering:
• Prevents write-after-read hazards
• Ensures precise exceptions
• Allows speculative slots to be safely invalidated
Execution is divided into functional domains, referred to as boxes.
Boxes are execution domains not pipeline stages.
The IBox is responsible for:
• Instruction fetch
• Grain resolution and decode caching
• Branch target calculation
• Frontend stall enforcement
Decoding occurs once per instruction pattern; cached grains may be reused many times.
The EBox performs:
• Integer arithmetic and logic
• Address calculation
• Conditional evaluation
• Dispatch to memory and coherency services
The FBox executes floating-point operations and maintains FP state.
Execution policy:
• Floating-point operations execute synchronously in the EX stage
• The pipeline stalls in EX until the FP operation completes
• No asynchronous FP completion tracking is used
This design favors simplicity, correctness, and debuggability.
The MBox coordinates:
• Load and store execution
• DTB translation
• Cache access
• Interaction with memory barriers
All memory accesses occur during the EX stage via the MBox.
The CBox manages:
• Memory barriers (MB, WMB, EXCB, TRAPB)
• Write buffer draining
• LL/SC reservation coordination
• SMP coherency hooks
Barriers are enforced by stalling the pipeline frontend until release conditions are met.
The PalBox executes PAL functionality:
• CALL_PAL dispatch
• Privileged register access (MFPR, MTPR)
• Processor mode transitions
• Interrupt and exception handoff
PAL executes inside the pipeline and is treated as a serialization boundary.
• GuestMemory is the shared physical address space visible to all CPUs
• SafeMemory provides the backing store and safety mechanisms
• MMIO regions are routed through the memory system but obey stronger ordering rules
GuestMemory is the sole authority for:
• Physical memory visibility
• LL/SC reservation invalidation
• SMP coherency effects
EMulatR models Alpha’s weakly ordered memory architecture with explicit serialization:
• Loads complete synchronously in EX
• Stores are buffered and may drain asynchronously
• No implicit ordering is guaranteed
Memory barriers (MB, WMB, EXCB, TRAPB):
• Stall the pipeline frontend
• Prevent speculation past the barrier
• Release only after required drain conditions are met
MMIO accesses are strongly ordered and never buffered.
Pipeline stalls may occur due to:
• Memory barriers
• Fault dispatch
• TLB misses
• Floating-point EX occupancy
• PAL entry/exit
• MMIO wait states (future)
• Branch misprediction recovery
• Speculative execution may proceed past loads
• Speculation must not proceed past barriers
• Barriers block frontend fetch until released
This policy preserves Alpha ordering semantics while allowing forward progress.
All exceptional conditions are queued and prioritized by the FaultDispatcher.
Properties:
• Precise exception delivery
• Strict priority ordering
• Single architectural delivery point
When an exception is delivered:
• All younger pipeline slots are flushed
• Architectural state is preserved up to WB
• LL/SC reservations are cleared
• Execution vectors to PAL or kernel handlers
Interrupts are:
• Sampled during pre-cycle checks
• Delivered only when enabled and safe
• Integrated with IPIs via IRQ-class events
PAL execution is fully integrated into the pipeline.
Properties:
• CALL_PAL acts as a serialization point
• No speculative execution past PAL entry
• PAL state is stored in `IPRStorage_Hot`
• HW_REI restores prior execution context
This preserves precise exception and interrupt semantics.
• Maximum CPUs: 64
• CPU IDs are typed (`CPUIdType`, `quint16`)
• CPUs are created and owned by the SMP manager
Each CPU owns:
• Its pipeline
• Register files
• TLBs and caches
All CPUs share:
• GuestMemory
• MMIO devices
• Coherency and IPI infrastructure
SMP coordination includes:
• IPIs delivered via IRQ mechanisms
• TLB shootdowns
• Barrier synchronization
GuestMemory remains the coherence authority.
MMIO design follows real hardware principles:
• MMIO register accesses are synchronous and strongly ordered
• Device operations execute asynchronously
• Completion is signaled via interrupts or polling
This separation ensures correct device protocol behavior.
The following invariants must always hold:
1. Writeback is the sole architectural commit point
2. No speculation past barriers or PAL entry
3. One LL/SC reservation per CPU
4. GuestMemory is the sole coherence authority
5. Exceptions are precise and ordered
6. PAL transitions serialize execution
Violating these invariants is a correctness bug.
EMulatR implements a clean, deterministic, and architecturally faithful Alpha AXP execution model.
Its pipeline is intentionally lightweight, execution-centric, and barrier-aware, enabling SMP correctness without unnecessary complexity.
This architecture provides a stable foundation for:
• Instruction expansion
• Device growth
• Performance optimization
• Long-term maintainability