#### Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy

Seongmoo Heo, Ronny Krashinsky, Krste Asanović MIT - Laboratory for Computer Science http://www.cag.lcs.mit.edu/scale

> ARVLSI March 15, 2001

- Critical Timing Elements (TEs) in modern synchronous VLSI systems
  - ✓ Significant impact on cycle time
  - ✓ Big portion of energy consumption





## **Motivation**

- Previous work tried to find the most energy-efficient and fastest TEs
   ✓ assuming a single TE design used uniformly throughout a circuit.
   ✓ using a very limited set of data patterns and un-gated clock signal.
- Two important observations
  - There is a wide variation in clock and data activity across different TEs. Many TEs are not in the critical path, and thus have ample time slack.

#### **Basic Idea**

- Selection from a heterogeneous library of designs, each tuned to different operating regimes
- Operating regimes :
  - Different input and clock signal activities
  - Different speed requirements

### **Related Work**

- The use of timing slack for reduced energy
  - Examples :
    - Traditional transistor sizing
    - Cluster voltage scaling [Usami and Horowitz '95]
    - Multiple threshold voltage or series transistor

for reducing leakage current [McPherson *et al.* '00, Yamashita *et al.* '00, Johnson *et al.* '99]

## **Our Contribution**

- Detailed energy characterization of wide range of TEs as a function of signal activities.
- Detailed measurement of TE signal activities for a microprocessor running complete programs
- Exploit signal activity to reduce TE energy by using different TE structures.

### **Overview**

- Flip-Flop and Latch Designs
- Test Bench and Simulation Setup
- Delay and Energy Characterization
- Energy Analysis with Test Waveforms
- Evaluation with Processor
- Conclusion

#### **Latch Designs**



Transistor sizes optimized for two extremes: Highest speed vs. Lowest power

# **Flip-Flop Designs**





Transistor sizes optimized for two extremes: Highest speed vs. Lowest power

#### **Test Bench**

- Used fixed, realistic input driver
- Determined appropriate output load
  - As large as 200fF output load was used by previous work.
  - We used 7.2fF (4 min-inv cap) because 60% of output loads in the VP microprocessor datapath are smaller than 14.4fF.
  - <sup>o</sup> Further work on load-sensitive analysis at upcoming WVLSI
- Sized clock buffer to give equal rise/fall time



### **Simulation Setup**

- Custom layout in 0.25µm TSMC CMOS process with Magic layout program
- Layout extraction with SPACE 2D extractor
- Circuit simulation with Hspice under nominal condition of Vdd=2.5V and T= $25^{\circ}C$ 
  - Hspice .*Measure* command to measure delay and energy

#### **Delay Characterization**

- Flip-flop : Minimum D-Q delay [Stojanovic et al. '99]
- Latch : D-Q delay



## **Energy Characterization**

- Total energy = input energy + internal energy
- + clock energy output energy
- Accurate energy characterization
  - <sup>o</sup> State-transition technique based on [Zyuban and Kogge '99]

ΤE

СŪ

7.2 fF

D

Data In

Clock



## **Energy Tables**

#### (a) Flip-flops

|         | 100   | 100   | 111   | 111   | 000   | 010   | 001      | 011        | 010   | 110   | 111   | 011   | 000   | 100  | 101  | 001   |
|---------|-------|-------|-------|-------|-------|-------|----------|------------|-------|-------|-------|-------|-------|------|------|-------|
|         |       |       |       |       |       |       |          |            |       |       |       |       |       |      |      |       |
| -       |       |       |       |       |       |       |          |            |       |       |       |       |       |      |      |       |
| PPCFF   | 48.4  | 95.5  | 89.2  | 47.6  | 46.3  | 100.9 | 91.5     | 49.1       | 68.1  | 19.4  | 19.4  | 68.1  | 49.7  | 6.9  | 6.9  | 51.2  |
|         |       | 95.4  | 89.0  |       | 46.0  |       |          | 46.8       |       | 19.2  |       | 68.0  | 49.7  |      | 6.9  |       |
| SSAFF   | 21,1  | 92.2  | 103.8 | 21,2  | 21.9  | 101.8 | 101.0    | 21.9       | 115.9 | 56.1  | 43.2  | 114,2 | 103.1 | 33.4 | 37.4 | 103.7 |
| SAFF    | 65.8  | 112.9 | 118.0 | 68.1  | 53.9  | 54.2  | 59.8     | 61.9       | 26.4  | 28.3  | 28.2  | 26.5  | 15.6  | 17.0 | 17.8 | 15.6  |
| MSAFF   | 96.2  | 156.2 | 149.8 | 98.7  | 93.0  | 98.5  | 87.3     | 94.0       | 26.5  | 28.3  | 28.2  | 26.6  | 15.9  | 16.9 | 17.8 | 15.7  |
|         |       |       |       |       | 95.7  | 91.7  | 90.9     | 88.3       |       | 28.3  | 28.2  |       |       | 17.0 | 16.9 |       |
| HLFF    | 106.4 | 188.8 | 330.3 | 237.2 | 91.4  | 102.3 | 113.1    | 123.5      | 24.5  | 18.2  | 15.6  | 24,7  | 6.0   | 10.2 | 10.5 | 6.0   |
|         | 129.3 | 183.3 |       |       | 92.4  |       |          |            | 24.5  | 15.4  |       | 22.6  |       |      |      |       |
| HLSFF   | 49.7  | 138.6 | 273.6 | 207.1 | 66.1  | 76.5  | 84.7     | 95.5       | 27.9  | 18.1  | 16.5  | 27.6  | 9.3   | 10.1 | 10.3 | 9.3   |
|         | 71.8  | 132.3 |       |       | 66.0  |       |          |            | 35.7  | 16.1  |       | 23.4  |       |      |      |       |
| SSAPL   | 98.4  | 187.2 | 181.9 | 99.3  | 64.8  | 74.6  | 72.9     | 65.8       | 72.7  | 82.2  | 70.1  | 53.1  | 39.7  | 53.6 | 52.0 | 47.6  |
| SSASPL  | 68.8  | 140.7 | 151.9 | 68.8  | 19.5  | 19.5  | 19.5     | 19.5       | 49.8  | 49.8  | 37.0  | 37.0  | 27.4  | 27.4 | 30.3 | 30.3  |
| CCPPCFF | 21.4  | 416.9 | 366.9 | 21.5  | 27.6  | 268.4 | 276.8    | 43.4       | 278.4 | 71.3  | 61.6  | 138.3 | 96.8  | 39.8 | 63.7 | 248.6 |
|         |       | 416.7 | 366.8 |       | 43.6  |       |          | 27.5       |       | 84.9  |       | 149.0 | 102.6 |      | 54.3 |       |
|         |       |       |       |       |       |       | High-Spe | ed Flip-Fl | op    |       |       |       |       |      |      |       |
| PPCFF   | 57.9  | 115.3 | 97.8  | 49.3  | 47.1  | 119.5 | 106.6    | 57.7       | 87.7  | 19.6  | 19.9  | 88.4  | 61.5  | 9.3  | 9.2  | 62.1  |
|         |       | 115.1 | 98.0  |       | 47.0  |       |          | 54.9       |       | 19.5  |       | 88.3  | 61.9  |      | 9.1  |       |
| SSAFF   | 66.5  | 273.8 | 185.4 | 66.9  | 41.4  | 199.8 | 196.2    | 41.0       | 216.5 | 92.5  | 71.5  | 205.9 | 180.1 | 55.4 | 60.3 | 191.5 |
| SAFF    | 164.8 | 246.9 | 257.2 | 164.7 | 105.1 | 97.7  | 110.4    | 125.4      | 39.8  | 48.6  | 48.6  | 41.9  | 29.6  | 35.6 | 36.2 | 26.9  |
| MSAFF   | 211.4 | 288.5 | 263.8 | 172.9 | 169.1 | 172.8 | 125.7    | 134.5      | 35.6  | 43.2  | 42.5  | 36.4  | 26.8  | 28.1 | 29.1 | 24.0  |
|         |       |       |       |       | 173.0 | 168.1 | 129.5    | 130.4      |       | 43.1  | 42.5  |       |       | 28.2 | 28.9 |       |
| HLFF    | 174.7 | 272.3 | 443.6 | 382.4 | 175.5 | 212.7 | 217.8    | 251.9      | 51.5  | 29.7  | 24.7  | 50.8  | 5.6   | 16.0 | 15.1 | 5.5   |
|         | 209.3 | 260.3 |       |       | 179.8 |       |          |            | 51.2  | 24.3  |       | 45.9  |       |      |      |       |
| HLSFF   | 89.3  | 210.4 | 397.6 | 325.6 | 167.0 | 194.0 | 206.4    | 233.2      | 51.8  | 29.3  | 26.8  | 51.7  | 5.8   | 16.8 | 15.5 | 5.8   |
|         | 125.9 | 196.3 |       |       | 166.2 |       |          |            | .59.2 | 27.2  |       | 46.1  |       |      |      |       |
| SSAPL   | 135.3 | 254.9 | 223.6 | 136.1 | 94.3  | 110.8 | 110.5    | 96.8       | 100.7 | 130.8 | 108.9 | 80.4  | 43.4  | 73.1 | 77.1 | 65.7  |
| SSASPL  | 108.6 | 234.7 | 209.4 | 108.5 | 19.5  | 19.5  | 19.5     | 19.5       | 101.2 | 101.2 | 68.7  | 68.7  | 39.7  | 39.7 | 60.3 | 60.3  |
| CCPPCFF | 44.7  | 414.1 | 383.6 | 45.4  | 36.9  | 342.3 | 335.1    | 59.2       | 340.0 | 64.9  | 68.5  | 170.1 | 116.3 | 48.1 | 77.4 | 296.7 |
|         |       | 414.1 | 383.1 |       | 59.0  |       |          | 36.6       |       | 97.5  |       | 173.6 | 121.6 |      | 44.9 |       |

000 001 010 011 100 110 101 111 000 100 101 001 010 110 111 011

#### (b) Latches

|        | 100             | 100   | 111    | 111          | 000          | 011            | 010          | 011  | 000  | 001          | 111              | 100            |
|--------|-----------------|-------|--------|--------------|--------------|----------------|--------------|------|------|--------------|------------------|----------------|
|        | Low-Power Latch |       |        |              |              |                |              |      |      |              |                  |                |
| PPCLA  | 22.8            | 56.5  | 79.8   | 21.2         | 23.4<br>24.4 | 24.9<br>24.7   | 19.2         | 18.0 | 6.1  | 6.8          | 77.1             | 48.2 47.0      |
| PTLA   | 18.3            | 226.5 | 95.0   | 29.3         | 0            | 0              | 32.3         | 32.4 | 32.0 | 30.1         | 90.8             | 266.8          |
| SSALA  | 21.9            | 93.8  | 105.0  | 21.9         | 0            | 0              | 49.8         | 37.0 | 27.4 | 30.3         | 110.4            | 91.2           |
| SSA2LA | 23.9<br>27.0    | 98.9  | 107.3  | 26.1<br>23.9 | 0            | 0              | 33.5<br>32.9 | 32.9 | 23.7 | 24.4<br>23.7 | 119.2            | 99.7           |
| CPNLA  | 45.0            | 74.4  | 1051.8 | 897.9        | 45.2<br>46.7 | 71.1<br>71.1   | 16.9         | 16.9 | 1.5  | 1.6          | 1100.5<br>1047.6 | 128.4<br>128.3 |
|        |                 |       |        |              | High-S       | peed Late      | h            |      |      |              |                  |                |
| PPCLA  | 22,7            | 54.5  | 71.8   | 24.6         | 25.9<br>27.1 | 24.3<br>24.6   | 19.7         | 18.0 | 8.2  | 9.1          | 68.0<br>68.4     | 45.1<br>44.8   |
| PTLA   | 24.7            | 152.4 | 141.7  | 54.4         | 0            | 0              | 54.4         | 55.3 | 67.1 | 59.9         | 156.8            | 188.1          |
| SSALA  | 47.4            | 173.5 | 148.2  | 47.3         | 0            | 0              | 101.2        | 68.7 | 39.7 | 60.3         | 135.8            | 145.8          |
| SSA2LA | 30.0<br>35.8    | 188.1 | 120.8  | 47.3<br>42.1 | 0            | 0              | 55.4<br>51.6 | 51.8 | 27.3 | 30.4<br>28.4 | 153.1            | 171.0          |
| CPNLA  | 78.2            | 115.2 | 1873.9 | 1620.0       | 65.0<br>66.6 | 114.0<br>113.9 | 34.9         | 34.9 | 0    | 0            | 1965.5<br>1868.1 | 219.6<br>222.0 |

000 001 010 011 100 111 000 001 010 011 100 111

# **Energy Tables**

|       |                     |              |              |              |              | 000<br>↓     | 001<br>↓ | 010<br>↓     | 011<br>↓ | 100<br>↓ | 110<br>↓     | 101<br>↓ | 111<br>↓   | 000<br>↓     | 100<br>↓     | 101<br>↓ | 001<br>↓     | 010<br>↓     | 110<br>↓    | 111<br>↓<br>101 | 011<br>↓     |
|-------|---------------------|--------------|--------------|--------------|--------------|--------------|----------|--------------|----------|----------|--------------|----------|------------|--------------|--------------|----------|--------------|--------------|-------------|-----------------|--------------|
|       |                     |              |              |              |              | 100          | 100      | 111          | 111      | 000      | 010          | Lew-Pow  | er Flip-Fl | 00           | 110          | 111      | 011          | 000          | 100         | 101             | 001          |
|       |                     | /            |              |              | PPCFF        | 48.4         | 95.5     | 89.2         | 47.6     | 46.3     | 100.9        | 91.5     | 49.1       | 68.1         | 19.4         | 19.4     | 68.1         | 49.7         | 6.9         | 6.9             | 51.2         |
|       | $\langle a \rangle$ |              | flama        |              | SSAFE        | 21.1         | 95.4     | 89.0         | 21.2     | 46.0     | 101.8        | 101.0    | 46.8       | 115.0        | 19.2         | 43.2     | 68.0         | 49.7         | 33.4        | 6.9             | 103.7        |
|       | (a)                 | rup-         | nops         | 5 /          | SAFF         | 65.8         | 112.9    | 118.0        | 68.1     | 53.9     | 54.2         | 59.8     | 61.9       | 26.4         | 28.3         | 28.2     | 26.5         | 15.6         | 17.0        | 17.8            | 15.6         |
|       |                     |              | •            |              | MSAFF        | 96.2         | 156.2    | 2 149.8      | 98.7     | 93.0     | 98.5         | 87.3     | 94.0       | 26.5         | 28.3         | 28.2     | 26.6         | 15.9         | 16.9        | 17.8            | 15.7         |
|       |                     |              | /            |              |              | 106.4        | 100 0    | 220.2        | 127.1    | 95.7     | 91.7         | 90.9     | 88.3       | 24.5         | 28.3         | 28.2     | 24.7         | 60           | 17.0        | 16.9            | 6.0          |
|       |                     |              |              |              | THEFT.       | 129.3        | 183.3    | 3            | 2,11,2   | 92.4     | 102.5        | 11.,.1   | 12,1,1     | 24.5         | 15.4         | 1.555    | 22.6         | 0.0          | 10.2        | 10.7            | 0.0          |
|       |                     |              |              |              | HLSFF        | 49.7         | 138.6    | 273.6        | 207.1    | 66.1     | 76.5         | 84.7     | 95.5       | 27.9         | 18.1         | 16.5     | 27.6         | 9.3          | 10.1        | 10.3            | 9.3          |
|       | 000                 | 001          | 010          | 011          | 100          | 11           | 0        | 101          | 11       | 1        | 000          | 100      | ) / (      | 101          | 001          | C        | 010          | 110          | 1           | 11              | 011          |
|       | $\downarrow$        | $\downarrow$ | $\downarrow$ | $\downarrow$ | $\downarrow$ | $\downarrow$ |          | $\downarrow$ | ↓        |          | $\downarrow$ | ↓        |            | $\downarrow$ | $\downarrow$ |          | $\downarrow$ | $\downarrow$ |             | $\downarrow$    | $\downarrow$ |
|       | 100                 | 100          | 111          | 111          | 000          | 01           | 0        | 001          | 01       | 1        | 010          | 110      | )   ·      | 111          | 011          | C        | 000          | 100          | 1           | 01              | 001          |
|       |                     |              |              |              |              |              | Lo       | w-Po         | wer F    | -lip-l   | Flop         |          |            |              |              |          |              |              |             |                 |              |
| PPCFF | 48.4                | 95.5         | 89.2         | 47.6         | 46.3         | 10           | 1        | 91.5         | 49.      | 1        | 68.1         | 19.      | 4   1      | 9.4          | 68.1         | 4        | 9.7          | 6.9          | 6           | 5.9             | 51.2         |
|       |                     | 95.4         | 89.0         |              | 46.0         |              |          |              | 46.      | 8        |              | 19.      | 2          |              | 68.0         | ) 4      | 9.7          |              | 6           | 5.9             |              |
|       |                     |              |              |              | 004.01       | 125.9        | 196.3    | 1            | 1261     | 166.2    | 110.0        | 110.5    | 06.0       | 59.2         | 27.2         | 100.0    | 46.1         | 42.4         | <b>23.4</b> |                 | (4.9         |
|       |                     |              |              |              | SSAPL        | 135.3        | 254.9    | 223.6        | 1,36,1   | 94.3     | 110.8        | 110.5    | 96.8       | 100.7        | 130.8        | 68.7     | 80.4<br>68.7 | 43.4         | 73.1        | 60.3            | 60.3         |
|       |                     |              |              |              | CCPPCFF      | 44.7         | 414.1    | 383.6        | 45.4     | 36.9     | 342.3        | 335.1    | 59.2       | 340.0        | 64.9         | 68.5     | 170.1        | 116.3        | 48.1        | 77.4            | 296.7        |
|       |                     |              |              |              |              |              | 414.1    | 383.1        |          | 59.0     |              |          | 36.6       |              | 97.5         |          | 173.6        | 121.6        | 1           | 44.9            |              |

| (b) | Latches |
|-----|---------|
|-----|---------|

|        | 000  | 001   | 010    | 011    | 100    | 111       | 000   | 001  | 010  | 011  | 100    | 111   |
|--------|------|-------|--------|--------|--------|-----------|-------|------|------|------|--------|-------|
|        | +    | +     | +      | +      | 1      | +         | +     | +    | +    | +    | +      | +     |
|        | 100  | 100   | 111    | 111    | 000    | 011       | 010   | 011  | 000  | 001  | 111    | 100   |
|        |      |       |        |        | Low-P  | ower Late | h     |      |      |      |        |       |
| PPCLA  | 22.8 | 56.5  | 79.8   | 21.2   | 23.4   | 24.9      | 19.2  | 18.0 | 6.1  | 6.8  | 77.1   | 48.2  |
|        |      |       |        |        | 24.4   | 24.7      |       |      |      |      | 73.5   | 47.0  |
| PTLA   | 18.3 | 226.5 | 95.0   | 29.3   | 0      | 0         | 32.3  | 32.4 | 32.0 | 30.1 | 90.8   | 266.8 |
| SSALA  | 21.9 | 93.8  | 105.0  | 21.9   | 0      | 0         | 49.8  | 37.0 | 27.4 | 30.3 | 110.4  | 91.2  |
| SSA2LA | 23.9 | 98.9  | 107.3  | 26.1   | 0      | 0         | 33.5  | 32.9 | 23.7 | 24.4 | 119.2  | 99.7  |
|        | 27.0 |       |        | 23.9   |        |           | 32.9  |      |      | 23.7 |        |       |
| CPNLA  | 45.0 | 74.4  | 1051.8 | 897.9  | 45.2   | 71.1      | 16.9  | 16.9 | 1.5  | 1.6  | 1100.5 | 128.4 |
|        |      |       |        |        | 46.7   | 71.1      |       |      |      |      | 1047.6 | 128.3 |
|        |      |       |        |        | High-S | peed Late | h     |      |      |      |        |       |
| PPCLA  | 22.7 | 54.5  | 71.8   | 24.6   | 25.9   | 24.3      | 19.7  | 18.0 | 8.2  | 9.1  | 68.0   | 45.1  |
|        |      |       |        |        | 27.1   | 24.6      |       |      |      |      | 68.4   | 44.8  |
| PTLA   | 24.7 | 152.4 | 141.7  | 54.4   | 0      | 0         | 54.4  | 55.3 | 67.1 | 59.9 | 156.8  | 188.1 |
| SSALA  | 47.4 | 173.5 | 148.2  | 47.3   | 0      | 0         | 101.2 | 68.7 | 39.7 | 60.3 | 135.8  | 145.8 |
| SSA2LA | 30.0 | 188.1 | 120.8  | 47.3   | 0      | 0         | 55.4  | 51.8 | 27.3 | 30.4 | 153.1  | 171.0 |
|        | 35.8 |       |        | 42.1   |        |           | 51.6  |      |      | 28.4 |        |       |
| CPNLA  | 78.2 | 115.2 | 1873.9 | 1620.0 | 65.0   | 114.0     | 34.9  | 34.9 | 0    | 0    | 1965.5 | 219.6 |
|        |      |       |        |        | 66.6   | 113.9     |       |      |      |      | 1868.1 | 222.0 |

#### **Test Waveforms**



- Test 1 and 2 : high clock activity, no data and output activity
- Test 3 and 4 : high data activity, no clock and output activity
- Test 5, 6, and 7 : high clock, data, and output activity (Traditional)
- Test 8 : high clock and data activity, no output activity

## **Energy Analysis**



## **Processor Design and Simulation**

- Evaluation on a microprocessor datapath
- Vanilla Pekoe Processor
  - A classic 32-bit MIPS RISC 5 stage pipeline with caches and system coprocessor registers (R3000-compatible)
  - Aggressive clock gating to save energy
  - 22 multi-bit flip-flops and latches, totaling 675 individual bits
- Simulation with 5 programs of SPECint95 benchmarks
  - A fast cycle-accurate simulator [Krashinsky, Heo, Zhang, and Asanovic '00] with the ability of counting TE state transitions
  - 1.71 billion instructions and 2.69 billion cycles
- Some constraints
  - Cannot track the exact timing of signals
  - Cannot model glitches

#### **Flip-Flops and Latches in Processor**



#### **Flip-Flops and Latches in Processor**



#### **Flip-Flops and Latches in Processor**



## **Energy Breakdown**

| Flip-flops |                       |          |      |  |  |  |  |  |  |  |  |  |
|------------|-----------------------|----------|------|--|--|--|--|--|--|--|--|--|
|            | HLFF-hs Lowest-Energy |          |      |  |  |  |  |  |  |  |  |  |
| f_recovpc  | 25.1                  | SSAFF-lp | 3.57 |  |  |  |  |  |  |  |  |  |
| d_inst     | 31.2                  | SSAFF-lp | 6.52 |  |  |  |  |  |  |  |  |  |
| d_epc      | 20.5                  | SSAFF-lp | 2.74 |  |  |  |  |  |  |  |  |  |
| x_epc      | 20.3                  | SSAFF-lp | 2.62 |  |  |  |  |  |  |  |  |  |
| m_epc      | 20.2                  | SSAFF-lp | 2.55 |  |  |  |  |  |  |  |  |  |
| x_sd       | 2.6                   | SAFF-lp  | 1.06 |  |  |  |  |  |  |  |  |  |
| x_addr     | 8.0                   | SAFF-lp  | 2.57 |  |  |  |  |  |  |  |  |  |
| m_exe      | 24.6                  | SSAFF-lp | 4.76 |  |  |  |  |  |  |  |  |  |
| cp0_count  | 42.6                  | SSAFF-lp | 4.80 |  |  |  |  |  |  |  |  |  |
| cp0_comp   | 0.1                   | HLFF-lp  | 0.03 |  |  |  |  |  |  |  |  |  |
| cp0_baddr  | 0.3                   | HLFF-lp  | 0.18 |  |  |  |  |  |  |  |  |  |
| cp0_epc    | 0.1                   | HLFF-lp  | 0.05 |  |  |  |  |  |  |  |  |  |

| Latches                |      |           |      |  |  |  |  |  |  |  |  |
|------------------------|------|-----------|------|--|--|--|--|--|--|--|--|
| PPCLA-hs Lowest-Energy |      |           |      |  |  |  |  |  |  |  |  |
| p_pc                   | 3.22 | SSALA-lp  | 2.25 |  |  |  |  |  |  |  |  |
| f_pc                   | 2.95 | SSALA-lp  | 1.72 |  |  |  |  |  |  |  |  |
| d_rsalu                | 3.27 | SSALA-lp  | 3.16 |  |  |  |  |  |  |  |  |
| d_rtalu                | 2.81 | SSALA-lp  | 2.28 |  |  |  |  |  |  |  |  |
| d_rsshmd               | 0.75 | PPCLA-lp  | 0.70 |  |  |  |  |  |  |  |  |
| d_rtshmd               | 0.65 | PPCLA-lp  | 0.63 |  |  |  |  |  |  |  |  |
| d_aluctrl              | 1.26 | SSALA-lp  | 0.97 |  |  |  |  |  |  |  |  |
| m_exe                  | 3.88 | SSALA-lp  | 3.65 |  |  |  |  |  |  |  |  |
| x_sdalign              | 0.30 | SSA2LA-lp | 0.27 |  |  |  |  |  |  |  |  |
| w_result               | 2.74 | SSALA-lp  | 2.42 |  |  |  |  |  |  |  |  |

(unit: mJ)

(unit: mJ)

• 32-bit MIPS 5 stage pipeline datapath

• SPECint95 benchmarks: perl(test, primes),

*ijpeg(test), m88ksim(test), go(20,9), and lzw(medtest)* 

## **Processor Energy Results - Flip-Flop**



HS: Highest-Speed LP: Lowest-Power

#### Uniform

(A single design used uniformly throughout a circuit)

•Ref : Total datapath energy – Total TE energy = around 0.21J

## **Processor Energy Results - Flip-Flop**



•34% energy saving with conventional transistor sizing

## **Processor Energy Results - Flip-Flop**



HSLE: Activity-Sensitive selection
Uniform
HLFF- S izing
HLFF- HS LE

•52% energy saving over just transistor sizing with the best performance (HLFF-hs)

#### **Processor Energy Results - Latch**



•6.1% energy saving over just transistor sizing (1)

•8.3% energy saving compared to homogeneous design with PPCLA-hs (2)•PPCLA is the fastest and also very energy-efficient.

## **Summary of Energy Results**

- 63% TE energy saving compared to a homogeneous design with HLFF-hs and PPCLA-hs
- 46% TE energy saving compared to a design with conventional transistor sizing while keeping the best performance

## Conclusion

- ✓ We showed that activation patterns for various TEs in a circuit differ considerably.
- ✓ We found that there is wide variation in the optimal TE designs for different regimes.
- ✓ We provided complete energy and delay characterization.
- ✓ We applied our technique to a real processor which we simulated 2.7 billion cycles of programs and showed over 63% TE energy reduction without losing any performance.

#### Difficulty of using a heterogeneous mix of TEs?

- Already designers have been doing verification for each local clock and added complexity is minimal.

- Timing verification for non-critical TEs is simple.