#### PIII Data Stream

- Power Saving Modes
- Buses
- System
- Cache
- Memory Order Buffer
- Memory Hierarchy
- L1 Cache
- L2 Cache

| Clock State         | Exit Latency                                           | Snooping? | System Uses                                                    |
|---------------------|--------------------------------------------------------|-----------|----------------------------------------------------------------|
| Normal              | N/A                                                    | Yes       | Normal program execution                                       |
| Auto Halt           | Approximately 10 bus clocks                            | Yes       | S/W controlled entry idle mode                                 |
| Stop Grant          | 10 bus clocks                                          | Yes       | H/W controlled entry/exit mobile throttling                    |
| Quick Start         | Through snoop, to HALT/Grant<br>Snoop state: immediate | Yes       | H/W controlled entry/exit mobile throttling                    |
|                     | Through STPCLK#, to Normal state: 8 bus clocks         |           |                                                                |
| HALT/Grant<br>Snoop | A few bus clocks after the end<br>of snoop activity    | Yes       | Supports snooping in the low power states                      |
| Sleep               | To Stop Grant state 10 bus<br>clocks                   | No        | H/W controlled entry/exit desktop idle mode<br>support         |
| Deep Sleep          | 30 µsec                                                | No        | H/W controlled entry/exit mobile powered-on<br>suspend support |

Power Saving Modes



## Power Saving Modes





# PIII Buses At-a-Glance

| Address Bus Width                              | 36 Bit                                     |
|------------------------------------------------|--------------------------------------------|
| Data Bus Width                                 | 64 Bit                                     |
| Dual Independent Bus<br>(DIB) dedicated for L2 | 64+8 Bit (0.25 μm)<br>256+32 Bit (0.18 μm) |

#### PIII System Bus

- 133 MHZ
- ECC error checking
- Supports multiple processors
- 4 write back buffers
- 6 fill buffers
- 8 bus queue entries

# PIII Bus Enhancements

- Pentium II Write Buffers
- Removed dead cycle
- Using all fill buffers as WC fill buffers

# Memory Order Buffer (MOB)

- Load Buffer (LB)
- 16 entries
- Store Buffer (SB)
- 12 entries
- Re-dispatches µops
- Cache bandwidth

#### Memory Order Buffer (MOB) Re-Ordering

- Stores can not pass other loads or stores
- Loads can pass other loads, but can not pass stores
- Store Coloring
- Multiprocessing dilemma

### PIII Cache Design

- Harvard Architecture for L1
  - Unified for L2
- Inclusive

# Inclusive vs. Exclusive

- Inclusive: reduces effective size of lower level caches
- Exclusive: data resides in one cache

### L1 Instruction Cache

- Non-blocking 16 KB
- 4-way associativity
- 32 Byte/Line
- SI
- Fetch Port
- Internal and External Snoop Port
- Least Recently Used

#### L1 Data Cache

- Non-blocking 16 KB
- 4-way associativity
- 32 Bytes/Line
- MESI
- Dual-ported
- Snoop Port Write Allocate
- Least Recently Used



- Discrete Level 2 Cache
- Advanced Transfer Cache



### Discrete L2 Cache

- 512 KB+ off-die
- 64 Bit bus
- 4-way set associativity
- Slower, but bigger

# Advanced Transfer Cache

- 256 KB on-die
- 256 Bit Bus
- 8-way associativity
- Faster, but smaller

L2 Cache Effects on Power

| Max<br>T <sub>C0VER</sub><br>(°C)                                                           | 75                | 75                  | 75                  | 75                 | 75                  | 75                  | 75                  | 75                  | 75                | 75                | 52                  | 52                | 52                  | 75                | 75                  |
|---------------------------------------------------------------------------------------------|-------------------|---------------------|---------------------|--------------------|---------------------|---------------------|---------------------|---------------------|-------------------|-------------------|---------------------|-------------------|---------------------|-------------------|---------------------|
| Min<br>T <sub>COVER</sub><br>(°C)                                                           | 5                 | 5                   | 5                   | 5                  | 5                   | 5                   | 5                   | 5                   | 5                 | 5                 | 5                   | 5                 | 5                   | 5                 | ц,                  |
| L2 Cache<br>Max T <sub>CASE</sub><br>(°C)                                                   | 105               | 105                 | 105                 | N/A                | 105                 | N/A                 | 105                 | 105                 | N/A               | N/A               | V/N                 | V/V               | V/N                 | N/A               | N/A                 |
| L2 Cache<br>Min T <sub>CASE</sub><br>(°C)                                                   | 5                 | 5                   | 5                   | N/A                | 5                   | N/A                 | 5                   | 5                   | N/A               | V/N               | V/N                 | V/N               | V/N                 | N/A               | N/A                 |
| T_UNCTION<br>Offset<br>(°C)                                                                 | 4.8               | 4.8                 | 4.8                 | 2.0 <mark>7</mark> | 4.8                 | 2.17                | 4.8                 | 4.8                 | 2.37              | 2.37              | 2:51                | 2.57              | 2.77                | 2.87              | 2 R                 |
| T <sub>JUNCTION</sub>                                                                       | 06                | 06                  | 06                  | 82                 | 80                  | 82                  | 85                  | 85                  | 82                | 82                | 82                  | 82                | 80                  | 80                | RU                  |
| Power<br>Density <sup>4</sup><br>(W/cm <sup>2</sup> )<br>For<br>CPUID<br>0686h <sup>6</sup> | n/a               | n/a                 | e/u                 | 22.0               | e/u                 | 22.8                | e/u                 | e/u                 | 24.8              | 24.8              | 26.7                | 27.5              | 28.7                | 30.0              | 30 F                |
| Power<br>Density <sup>4</sup><br>(W/cm <sup>2</sup> )<br>Up to<br>CPUID<br>0683h            | 21.6 <sup>5</sup> | 23.9 <mark>5</mark> | 25.4 <mark>5</mark> | 19.3 <sup>6</sup>  | 26.3 <mark>5</mark> | 20.0 <mark>6</mark> | 29.5 <mark>5</mark> | 29.5 <mark>5</mark> | 21.8 <sup>6</sup> | 21.8 <sup>6</sup> | 23.4 <mark>6</mark> | 24.1 <sup>6</sup> | 25.2 <mark>6</mark> | 26.3 <sup>6</sup> | ეც ე <mark>6</mark> |
| L2<br>Cache<br>Power<br>(W)                                                                 | 1.26              | 1.33                | 1.37                | N/A                | 1.37                | N/A                 | 1.60                | 1.60                | N/A               | N/A               | V/N                 | V/V               | V/N                 | V/V               | N/A                 |
| Thermal<br>Design<br>Power <sup>2</sup><br>(W)                                              | 25.3              | 28.0                | 29.7                | 14.0               | 30.8                | 14.5                | 34.5                | 34.5                | 15.8              | 15.8              | 17.0                | 17.5              | 18.3                | 19.1              | 19 F                |
| L2<br>Cache<br>Size<br>(Kbytes)                                                             | 512               | 512                 | 512                 | 256                | 512                 | 256                 | 512                 | 512                 | 256               | 256               | 256                 | 256               | 256                 | 256               | 256                 |
| Proc.<br>Core Freq.<br>(MHz)                                                                | 450               | 500                 | 533B                | 533EB              | 550                 | 550E                | 600                 | 600B                | 600E              | 600EB             | 650                 | 667               | 200                 | 733               | 750                 |

# Software Controlled Caching

- Streaming Data Trashes Cache
- Skip levels in Memory Hierarchy
- Senior Load