User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
PowerPC 750GX Overview
Page 32 of 377
gx_01.fm.(1.2)
March 27,2006
1.2.2.3 Load/Store Unit (LSU)
The LSU executes all load-and-store instructions and provides the data-transfer interface between the GPRs,
FPRs, and the data-cache/memory subsystem. The LSU functions as a 2-stage pipelined unit, which calcu-
lates effective addresses in the first stage. In the second stage, the address is translated, the cache is
accessed, and the data is aligned if necessary. Unless extensive data alignment is required (for example, to
cross a double-word boundary), the instructions complete in two cycles with a 1-cycle throughput. The LSU
also provides sequencing for load/store string and multiple register transfer instructions.
Load-and-store instructions are translated and issued in program order. However, some memory accesses
can occur out of order. Synchronizing instructions can be used to enforce strict ordering if necessary. When
there are no data dependencies and the guard bit for the page or block is cleared, a maximum of one out-of-
order cacheable load operation can execute per cycle, with a 2-cycle total latency on a cache hit. Data
returned from the cache is held in a rename buffer until the completion logic commits the value to a GPR or
FPR. Stores cannot be executed out of order and are held in the store queue until the completion logic
signals that the store operation is to be completed to memory. The 750GX executes store instructions with a
maximum throughput of one per cycle and a 3-cycle latency to the data cache. The time required to perform
the actual load or store operation depends on the processor/bus clock ratio and whether the operation
involves the L1 cache, the L2 cache, system memory, or an I/O device.
The L/S unit has two reservation stations, Eib0 and Eib1. For loads, there is also a hold queue and a miss
queue. A load that misses in the dcache advances from Eib0 to the miss queue, where only necessary state
for instruction completion like the instruction ID and register rename ID are stored. If another load misses
under an outstanding miss, then it is held in the hold queue and Eib0 is free. Two more load instructions may
now be dispatched to Eib0 and Eib1. The Miss-under-Miss feature allows the hold, Eib0, and Eib1 load
requests to proceed out to the bus, even though there is an outstanding miss that would normally stall the
pending loads.
1.2.2.4 System Register Unit (SRU)
The SRU executes various system-level instructions, as well as Condition Register logical operations and
Move-to/Move-from Special-Purpose Register instructions. To maintain system state, most instructions
executed by the SRU are execution-serialized with other instructions; that is, the instruction is held for execu-
tion in the SRU until all previously issued instructions have been retired. Results from execution-serialized
instructions executed by the SRU are not available or forwarded for subsequent instructions until the instruc-
tion completes.
1.2.3 Memory Management Units (MMUs)
The 750GX’s MMUs support up to 4 petabytes (2
52
) of virtual memory and 4 gigabytes (2
32
) of physical
memory for instructions and data. The MMUs also control access privileges for these spaces on block and
page granularities. Referenced and changed status is maintained by the processor for each page to support
demand-paged virtual memory systems.
The LSU, with the aid of the MMU, translates effective addresses for data loads and stores. The effective
address is calculated on the first cycle, and the MMU translates it to a physical address at the same time it is
accessing the L1 cache on the second cycle. The MMU also provides the necessary control and protection
information to complete the access. By the end of the second cycle, the data and control information is avail-
able if no miss conditions for translate and cache access were encountered. This yields a 1-cycle throughput
and a 2-cycle latency.