Appendix K – Pipeline Retirement Mechanics

<< Click to Display Table of Contents >>

Navigation:  ASA-EMulatR Reference Guide > Introduction > Appendix > Appendix H - Alpha Pipeline  >

Appendix K – Pipeline Retirement Mechanics

Pipeline retirement is the architectural commit point — the moment an instruction's effects become permanently visible to all observers. This appendix provides the authoritative reference for the retirement contract, the deferred writeback mechanism, and the three retirement outcomes (normal commit, fault delivery, PAL transfer). Chapter 13.6.6 describes the stage_WB() execution order; this appendix explains why each step exists and what invariants it preserves.

 


 

K.1 What "Architecturally Committed" Means

 

An instruction is architecturally committed when all three conditions hold:

 

Visible — the instruction's register writes are in the architectural register file, and its store (if any) is in SafeMemory. Any subsequent instruction reading the same register or memory address will observe the committed value.

 

Irreversible — the instruction's effects cannot be undone by any future pipeline event (flush, fault, interrupt). Once committed, the instruction is part of the permanent execution history.

 

Precise — all instructions older than this one have already committed (in program order). No instruction younger than this one has committed. The PC reflects the committed instruction stream exactly.

 

Retirement is the only point where these three conditions become true simultaneously. Prior to retirement, an instruction's results are speculative — they may be discarded on flush.

 


 

K.2 Deferred Writeback — The PendingCommit Mechanism

 

The problem: When a grain executes in stage_EX(), it produces a result (slot.payLoad). That result must eventually reach the architectural register file. But writing immediately in EX would violate precise exceptions — if the instruction later faults (or a younger instruction faults first), the register write cannot be undone.

 

The solution: Deferred writeback. The result is captured in a PendingCommit structure and written to the register file one stage later, in stage_MEM(), via commitPending(). This separates result production (EX) from result publication (MEM), allowing faults to discard uncommitted results.

 

K.2.1 PendingCommit Structure

 

struct PendingCommit {

 bool intValid{false}; // Integer result pending

 quint8 intReg{0}; // Destination register index

 quint64 intValue{0}; // Result value

 bool intClearDirty{false}; // Clear EBox scoreboard on commit

 

 bool fpValid{false}; // FP result pending

 quint8 fpReg{0}; // Destination register index

 quint64 fpValue{0}; // Result value

 bool fpClearDirty{false}; // Clear FBox scoreboard on commit

};

 

PendingCommit holds at most one integer result and one FP result. The intClearDirty/fpClearDirty flags trigger scoreboard cleanup on commit — when the register file is written, the corresponding box's dirty bit is cleared, signaling that the register is no longer in-flight.

 

K.2.2 Deferred Writeback Lifecycle

 

Cycle N:

 stage_EX executes instruction A → grain produces result

 deferWriteback() captures result into m_pending

 m_pending.intValid = true

 m_pending.intReg = destReg

 m_pending.intValue = slot.payLoad

 

Cycle N+1:

 commitPending() writes A's result → register file updated

 m_intRegs->write(intReg, intValue)

 m_eBox->clearDirty(intReg)

 m_pending.intValid = false

 stage_EX executes instruction B → reads correct value from A

 deferWriteback() captures B's result into m_pending

 

Critical ordering invariant: commitPending() runs at the top of stage_WB(), which executes before stage_EX() in the same cycle (reverse stage order: WB→MEM→EX→IS→DE→IF). This guarantees that the previous cycle's deferred result is visible in the register file before the current cycle's instruction reads registers. No forwarding logic is needed. No pipeline stalls for RAW hazards.

 

K.2.3 Writeback Paths

 

deferWriteback() handles three distinct writeback paths:

 

Link register (BSR/JSR) — Ra = PC + 4. The return address is computed from the slot's instruction PC and written to the link register.

 

Integer/FP result — slot.payLoad written to the destination register identified by the decoded instruction. The destination type (integer vs FP) is determined by destIsFloat(slot.di).

 

No writeback — stores, branches (without link), barriers, and instructions writing to R31/F31 produce no register result. deferWriteback() is a no-op for these.

 

PAL instructions bypass the deferred writeback mechanism entirely. CALL_PAL is detected in stage_WB() before commitPrevious() would apply — the pending result from a younger instruction is discarded. PAL functions that modify registers do so directly through PalBox, which writes architectural state as part of the PAL function execution.

 


 

K.3 The stage_WB() Retirement Sequence

 

stage_WB() executes a strict seven-step sequence. The ordering is architecturally significant — reordering any step would break correctness.

 

Step 0 — commitPending(). Writes the deferred register result from the previous cycle to the architectural register file. This runs unconditionally, before the slot validity check. The pending result is from a different (older) instruction that already passed its own fault checks in EX. That instruction's result is architecturally valid regardless of what happens to the instruction currently in WB.

 

Step 1 — Slot validity check. If (!slot.valid), return. An empty slot produces no retirement.

 

Step 2 — Fault check (highest priority). If slot.faultPending is true: discard m_pending (the younger instruction in MEM is squashed — its deferred result must never reach the register file), set PipelineAction::FAULT with trapCode/faultVA/faultPC, invalidate the slot, return immediately. The fault propagates to AlphaCPU via BoxResult::faultDispatched(), which triggers enterPal().

 

Step 3 — CALL_PAL check (before store commit). If isCallPal(slot.di): discard m_pending (pipeline serializes — all younger instructions will be flushed), compute PAL entry vector via computeCallPalEntry(m_cpuId, palFunction), set PipelineAction::PAL_CALL with palFunction/callPC/palVector, invalidate the slot, return. AlphaCPU flushes the entire pipeline and enters PAL mode.

 

Step 4 — Store commit. If the instruction has store semantics (S_Store): m_guestMemory→write64(slot.pa, slot.payLoad), then m_reservationManager→breakReservationsOnCacheLine(slot.pa). This is the only point where store data reaches SafeMemory. Stores that faulted (Step 2) or were superseded by CALL_PAL (Step 3) never reach this step.

 

Step 5 — Branch predictor update. If slot.branchTaken: m_cBox→updatePrediction(slot.di.pc, slot.branchTaken, slot.branchTarget). Predictor training occurs at retirement to ensure only committed branch outcomes update the predictor. Speculative branches that were flushed never train the predictor.

 

Step 6 — Retirement. commitInstruction(slot): increment m_instructionsRetired, update m_totalCycles, emit EXECTRACE_WB_RETIRE. This is the permanent record of the instruction's execution.

 

Step 7 — Cleanup. slot.valid = false, slot.clear(). The slot is recycled for the next instruction entering the pipeline.

 


 

K.4 The Three Retirement Outcomes

 

Every instruction that reaches stage_WB() with slot.valid = true produces exactly one of three outcomes. There is no fourth case.

 

Outcome

PipelineAction

PendingCommit

Pipeline Effect

Fault

FAULT

Discarded (younger squashed)

Full flush, enter PAL at fault vector

PAL Transfer

PAL_CALL

Discarded (pipeline serializes)

Full flush, enter PAL at CALL_PAL vector

Normal Commit

ADVANCE

Committed (safe — older instruction passed)

Store written, predictor trained, instruction retired

 

PendingCommit disposition is the critical distinction: On fault or PAL transfer, m_pending is discarded because it contains a result from a younger instruction that must be squashed. On normal commit, m_pending is safe to commit because the producing instruction already passed its own fault checks. The pending result and the retiring instruction are from different instructions — the pending is always one instruction older.

 


 

K.5 Fault Handling at Retirement

 

Faults are detected in stage_EX() but delivered in stage_WB(). The delay ensures all older instructions have retired before the fault is delivered (precise exception guarantee). The fault handling sequence at retirement:

 

1. slot.faultPending is true — set in stage_EX() when a box detected an exception (TLB miss, access violation, arithmetic trap, alignment fault, illegal instruction).

 

2. m_pending is discarded. The pending result is from the instruction that was in EX when the faulting instruction was in MEM — one stage younger. That younger instruction's result must not reach the register file.

 

3. PipelineAction::FAULT is set with three fields: trapCode (identifies the exception class — DTBM_SINGLE, ITB_MISS, DFAULT, ACCESS_VIOLATION, ARITH, ILLEGAL_INSTRUCTION, etc.), faultVA (the virtual address that triggered the fault, for TLB/memory faults), faultPC (the PC of the faulting instruction, saved to EXC_ADDR for restart).

 

4. The slot is invalidated. No store commit, no predictor update, no retirement counter increment.

 

5. BoxResult::faultDispatched() propagates to AlphaCPU. AlphaCPU calls flushYoungerSlots() (discarding IF/DE/IS/EX/MEM), then enterPal() with the fault vector. EXC_ADDR is set to faultPC. PAL mode begins at the fault handler entry point.

 

Fault precedence: If multiple slots have faultPending set, the oldest instruction (lowest slotSequence) faults first. Because stage_WB() processes only the oldest slot each cycle, and fault delivery flushes all younger slots, at most one fault is delivered per cycle. Younger faulting instructions are discarded — their faults are artifacts of speculative execution past the true fault point.

 

Store isolation: The fault check (Step 2) executes before the store commit (Step 4). A faulting store instruction never writes to SafeMemory. A faulting load instruction's result never reaches the register file (its deferWriteback() populated m_pending, but the pending is discarded in the next cycle when the fault reaches WB). This two-phase isolation — stores blocked by step ordering, loads blocked by pending discard — is the mechanism that makes exceptions precise.

 


 

K.6 PAL Instructions at Retirement

 

K.6.1 CALL_PAL

 

CALL_PAL is detected in stage_WB() after the fault check but before store commit. The retirement sequence for CALL_PAL:

 

1. m_pending is discarded. Any younger instruction's deferred result is squashed — CALL_PAL serializes the pipeline.

 

2. The PAL function code is extracted from the instruction bits: palFunction(slot.di.rawBits()).

 

3. The PAL entry vector is computed: computeCallPalEntry(m_cpuId, palFunction). This uses PAL_BASE from the IPR storage and the architectural vector calculation: PAL_BASE[63:15] | function encoding | PAL mode bit. Privileged functions (0x00–0x3F) and unprivileged functions (0x80–0xBF) use different vector offsets.

 

4. PipelineAction::PAL_CALL is set with palFunction, callPC (slot.di.pc), and palVector.

 

5. The slot is invalidated. CALL_PAL is not "retired" in the normal sense — it does not increment m_instructionsRetired or emit EXECTRACE_WB_RETIRE. It is a control transfer, not a completed computation.

 

6. AlphaCPU receives PAL_CALL, flushes the entire pipeline (flush()), and calls PalBox::enterPal() with PalEntryReason::CALL_PAL_INSTRUCTION. PalBox sets PC to the vector address (with bit 0 set for PAL mode), activates shadow registers, saves the return address (callPC + 4) to EXC_ADDR, and begins PAL execution.

 

Critical invariant: CALL_PAL is not a pending event. There is no "pending PAL transfer" flag that survives between steps. CALL_PAL is an instruction outcome handled immediately in stage_WB() — detection, vector computation, and action are atomic within the same step.

 

K.6.2 HW_REI

 

HW_REI (Return from PAL) is the exit serialization point. When HW_REI retires in stage_WB():

 

1. The pipeline is fully flushed — all slots invalidated, MBox staging cleared, pending commits discarded.

 

2. PC is restored from EXC_ADDR (the return address saved on PAL entry).

 

3. PAL mode is cleared — PC bit 0 is set to 0, restoring normal execution mode.

 

4. LL/SC reservations are cleared for the executing CPU.

 

5. Shadow registers are deactivated — subsequent register references use the standard register file.

 

Together, CALL_PAL and HW_REI form a hard serialization boundary. No instruction from before the boundary survives into PAL mode, and no PAL-mode instruction survives into normal execution. Shadow register state, privileged IPR modifications, and PAL-mode memory access semantics are fully contained within the boundary.

 

K.6.3 PAL Functions That Modify Registers

 

PAL functions (MFPR, MTPR, SWPCTX, etc.) modify architectural registers directly through PalBox, not through the deferred writeback mechanism. PalBox writes to the register file, IPR storage, and HWPCB as part of the PAL function execution. These writes occur within PAL mode and are not subject to pipeline speculation — PAL mode is fully serialized (one instruction at a time, no speculative execution). The deferred writeback mechanism (m_pending / commitPending / deferWriteback) is not used for any PAL register modification.

 


 

K.7 Store Commit at Retirement

 

Store data reaches SafeMemory only at retirement (Step 4 of stage_WB). The store commit sequence:

 

1. The instruction must have store semantics: (slot.di.semantics & S_Store).

 

2. m_guestMemory→write64(slot.pa, slot.payLoad) writes the store data to the physical address computed during stage_EX(). This is an atomic write to SafeMemory.

 

3. m_reservationManager→breakReservationsOnCacheLine(slot.pa) clears any LL/SC reservations held by any CPU on the affected cache line. This ensures that a store between a LD_L and ST_C on another CPU will cause the ST_C to fail, preserving the atomicity contract.

 

Store ordering guarantee: Because only one instruction retires per cycle and retirement is in program order, stores are committed in program order. The memory model's weak ordering applies to visibility (when other CPUs see the store), not to the commit order. SafeMemory sees stores in strict program order; memory barriers (MB, WMB) control when those stores become globally visible.

 

Store-conditional (ST_C): ST_C checks the LL/SC reservation before writing. If the reservation is invalid (broken by another CPU's store or by an intervening exception), ST_C writes 0 to the destination register (indicating failure) and does not write to SafeMemory. If the reservation is valid, ST_C writes to SafeMemory and writes 1 to the destination register (indicating success). The reservation check occurs in stage_EX(); the store commit at retirement is conditional on the check result.

 


 

K.8 Retirement and Flush Interaction

 

Pipeline flush discards all in-flight instructions. The interaction between flush and the deferred writeback mechanism is architecturally critical:

 

Flush on fault/PAL: When stage_WB() detects a fault or CALL_PAL, it discards m_pending before returning. AlphaCPU then calls flush(), which clears all slots, clears MBox staging (clearMissStaging / clearIPRStaging), and resets the pipeline. The discarded pending result is from a younger instruction that would have been committed on the next cycle — discarding it ensures no speculative result survives the fault boundary.

 

Flush on misprediction: flushYoungerSlots() invalidates IF, DE, and IS. The EX and MEM slots are preserved — the mispredicting instruction in EX has already produced its result, and the instruction in MEM must still retire. m_pending is not discarded because it contains a valid result from the instruction currently in MEM.

 

Flush on interrupt: handleInterrupt() performs a full flush identical to the fault path. m_pending is discarded, all slots are cleared, and execution resumes at the interrupt vector in PAL mode.

 

Flush safety rule: commitPending() runs at the top of stage_WB(), before any flush decision is made. This ensures that the older instruction's deferred result is committed to the register file before the current cycle determines whether to flush. The committed result is architecturally valid — the producing instruction passed all checks. Even if the current WB instruction faults, the older instruction's result is correctly preserved.

 


 

K.9 Retirement Invariants

 

The following invariants hold for every retirement and are verified by the testing framework (Chapter 22):

 

Single-instruction retirement. Exactly one instruction retires per cycle (or zero if WB slot is empty). Never two. This is guaranteed by the ring buffer — only one slot occupies the WB position at any time.

 

In-order retirement. Instructions retire in strict program order. An instruction at slotSequence N always retires before slotSequence N+1. The ring buffer advancement and single-issue design enforce this without additional logic.

 

Fault-before-commit. The fault check (Step 2) always executes before the store commit (Step 4). A faulting instruction never writes to SafeMemory.

 

PAL-before-commit. The CALL_PAL check (Step 3) always executes before the store commit (Step 4). CALL_PAL serializes without side effects.

 

Pending-from-older. The m_pending result committed by commitPending() (Step 0) is always from an instruction older than the instruction in WB. The producing instruction already passed its own fault and retirement checks.

 

commitPending-before-EX. commitPending() runs before stage_EX() in every cycle (WB executes first in the reverse stage order). The register file is always current before any new instruction reads it.

 

No speculative store. Store data never reaches SafeMemory until the instruction retires successfully in stage_WB(). A faulting or flushed store leaves no trace in SafeMemory.

 

Reservation break on store. Every store that commits to SafeMemory breaks LL/SC reservations on the affected cache line. No store bypasses the reservation manager.

 

SMP independence. Retirement is per-CPU. Each CPU retires its own instructions independently. Store visibility to other CPUs is governed by the memory model and barrier semantics, not by retirement order across CPUs.

 

See Also: 13.6.6 Stage Implementations - stage_WB(); 13.10 Exception Precision ; Chapter 18 – Fault Dispatcher & Precise Exceptions ; Chapter 20 – Boot Sequence, PAL, and SRM Integration (CALL_PAL/HW_REI); Chapter 11 - Architectural Invariants ; cpuCoreLib/AlphaPipeline.h (stage_WB implementation).