# Design for MOSIS Educational Program (Research) Testchip for AHIP protocol, ASIC flow, and Leakage Control Through Body Biasing

Seongmoo Heo, Graduate Student Krste Asanović, Associate Professor

Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory 32 Vassar Street, 32-G776 Cambridge, MA 02139

#### Abstract

The Assam Test Chip 1 (ATC1) was fabricated using TSMC 180 nm process through a generous support from MOSIS Educational Program (MEP). This paper reports the measurement methodology used for testing ATC1 and the measurement results.

### **1** Overview

Figure 1 shows the die photo of the ATC1 chip.



Figure 1: The die photo of the ATC1 chip.

As can be seen from the die photo, ATC1 consists of two separate modules: an ASIC module implementing AHIP and a DSP kernel (RGBYCC translation) on bottom and a chunks of custom-made huge inverter chains on top (the white rectangle. We first describe the ASIC module (test methodology in Section 2 and test results in Section 4) and later describe the inverter chains in Section 5. Section 6 summarizes this report.

### 2 ASIC: Test Methodology

Figure 2 shows a lab setup for testing ATC1. It shows the host PC, PCI card (aka PLX), PLX daughtercard, tester baseboard, ATC1 daughtercard, and ATC1 chip. Figure 3 shows how each unit in the system communicates and the interfaces and methods it uses. Figure 4 shows how each block is located and connected in the test setup. Both the hardware and software ATC1 implementations share the same test software, the test software should utilize only the exported ATC1 interface: reset(), read(), write(). When doing software-only tests, read() and write() will affect the verilog model of ATC1. When doing the tests with the hardware, read() and write() are invoked on a file handle for the /dev/plx device. The device driver sends the address and data along to the Xilinx controller which decodes and acts on them as necessary: some addresses are memory-mapped devices on the baseboard; others are passed through the AHIP (asynchronous host interface protocol) block on the Xilinx controller.

Our lab setup and testing methodology is similar to that of ATC0, our group's previous testchip [1].



Figure 2: The physical pieces of the ATC1 test rig: Host PC, PCI Card (aka the PLX), PLX Daughtercard, Tester Baseboard, ATC1 Daughercard, and ATC1 chip.

### **3** ASIC: Implementation

The ASIC part consists of three modules: AHIP, test harness, and RGBYCC kernel (Figure 5. AHIP implements the protocol between the host and RGBYCC kernel and the test harness is the glue logic between the AHIP and kernel.



Figure 3: The modules and interfaces in the ATC1 test rig.



Figure 4: The connections and locations of the modules in the ATC1 test rig.



Figure 5: The overview of ATC1. On the bottom, the ASIC part is located.

### **3.1 AHIP**

AHIP is the data communication channel between the host and ATC1, through which the data travels through a 8-bit wide bi-directional bus. ATC1 is a slave device to the host. The host and ATC1 run on asynchronous clocks, using two bits, req and ack, to implement a handshake protocol.

### 3.2 Test Harness

This is an interface glue logic between the AHIP and RGBYCC kernel. When there is an AHIP write request, it copies ahip\_core to one of four registers: in0, in1, test0, and test1. Similarly, on an AHIP read request, it reads from out0 or out1 (Figure 5).

The bottom two bits of the address determine which register to write to or read from (Table 1).

The RGBYCC kernel reads from in0 and in1 alternatively every cycle. Likewise, the output of the kernel is written to out0 and out1 alternatively every cycle. The hardware check is done by comparing the output and test registers registers using the comparator.

| Bottom two bits of address | Register to write to | Register to read from |
|----------------------------|----------------------|-----------------------|
| 00                         | in0                  | out0                  |
| 01                         | inl                  | out0                  |
| 10                         | test0                | outl                  |
| 11                         | test1                | outl                  |

Table 1: Interface registers for the RGBYCC kernel.

### 3.3 RGBYCC Kernel

The RGBYCC kernel translates a 24-bit RGB signal (8-bit for each color) to a 24-bit YCrCb signal. The translation can be represented by following linear equations.

$$y = (0x4c8b*r + 0x9646*g + 0x1d2f*b) \gg 16$$
(1)
$$cr = (((0x8000*r - 0x6b2f*g - 0x1dd1*b) \gg 16) + 128) \& 0xff$$
(2)

$$cr = (((0x8000 * r - 0x6b2f * g - 0x14d1 * b) \gg 16) + 128) \& 0xff$$
(2)

$$cb = (((-0x2b33 * r - 0x54cd * g + 0x8000 * b) \gg 16) + 128) \& 0xff$$
(3)

$$ycc = (y \ll 16) + (cb \ll 8) + cr$$
 (4)

For high throughput, hundreds of adders are utilized to achieve fast matrix multiplication and they are highly pipelined.

## 4 ASIC: Test Results

We tested the ASIC parts of 18 ATC1 chips and they all functioned correctly at nominal voltage and clock frequency.

#### 4.1 Test Application

In addition to the dynamic equality check by the hardware comparator, we check the functionality of the kernel by applying random vectors, reading the outputs from the kernel and comparing them to expected values at the host side.

- 1. write in0 with a random RGB value
- 2. write test0 with the expected YCC value for the RGB value stored in in0
- 3. write in1 with a random RGB values again
- 4. write test1 with the expected YCC value for the RGB value stored in in1

- 5. read out out0 and check the correctness
- 6. read out out1 and check the correctness

#### 4.2 Schmoo Plot

Figure 6 shows the schmoo plot of the ASIC parts of 18 ATC1 chips. The size of circle represents the number of chips which passed the 40,000 random test vector tests. We can see that some chips can reach 400 MHz at 1.8V, which is the maximum clock frequency supported by the Xilinx controller on the baseboard.



Figure 6: Schmoo plot of 18 ATC1 chips.

### **5** Inverter Chain Experiment

Thirty-eight 61-stage inverter chains are used for the feasibility test of forward-body-biasing (FBB). Two bit selection signals can control the number of turned-on inverter chains. Poweroptimal pipelining greatly depends on the levels of threshold voltages and FBB provides a cheap way of changing threshold voltages. FBB reduces effective threshold voltages and as a result, increases transistor speeds while increasing the leakage power exponentially. The body voltages of the inverter chains are not tied to supply and ground, but independently controlled. For the NMOS transistor body voltage control, deep N-well layers of TSMC18 process are used. The NMOS and PMOS body voltages are controlled through analog pins externally.

The periods of the inverter chains were measured to be between 6.0 ns (167 MHz) and 7.7 ns (143 MHz).

The significant amount of inductance found at the package and bonding wires led to the ringing effect of the inverter chain output signals. While the inverter chain output pads produce 0 V to 3.3 V square pulses, the measured signals at the inverter chain output pin looked like sine waves whose ranges were around -1 V to 4.5 V.

#### 5.1 Threshold Voltage

Figure 7 shows the IV curve of NMOS and PMOS transistors using the SPICE parameters obtained from measurements on a selected wafer of the run (TSMC T4BK\_MM\_NON\_EPI) on which ATC1 chips were fabricated. Temperature was  $25^{\circ}C$ . The transistors are at saturation modes ( $I_{ds} \propto (V_{gs} - V_t)^2$ ) and the square root of the current is plotted, therefore, the x-intercepts of the linearized curves represent the threshold voltages of the transistors.

The curves show that the NMOS transistors of ATC1 chips are around at the slow corner and the PMOS transistors at the typical corner.



Figure 7: The simulated IV curve of the transistor spice model measured from the TSMC T4BK\_MM\_NON\_EPI run. The drain and source voltages were set at 1.8 V and 0 V respectively.

#### 5.2 Forward Body Biasing

FBB usually results in a better power-delay curve of supply voltage scaling when leakage power is comparably small to active power. Approximately, the best power-delay tradeoff is achieved when the leakage power is around 30% of the total power.

The left-hand plot of Figure 8 compares the power-delay curve when there is no body-biasing (ZBB) with the one when there is 0.45 V body-biasing for both NMOS and PMOS transistors of the inverter chains. It is shown that FBB results in a better power-delay tradeoff of supply voltage scaling. The right-hand plot shows the power-delay graphs of the RGBYCC kernel when there is zero or forward (0.45 V) body-biasing. The FBB results are extrapolated using the inverter chain data.



Figure 8: The power-delay curves of inverter chains and RGBYCC kernel.

However, FBB was not very successful in leakage current Because the base leakage when there was no body bias was too small, the leakage current was not measurable regardless of the amount of the body bias. One of the reasons why the leakage was immeasurably small was that the temperature was too low. The combination of 38 inverter chains, each of which has 61 stages, were not enough to generate significant power and heat.

We increased the body voltages above the diode turn-on voltage and measured the leakage currents from VPB (PMOS body voltage), VNB (NMOS body voltage), and VDD (supply voltage) while forcing the inverter chains to sleep by putting zero into the NAND gate (the inverter chain consists of 60 inverters and 1 NAND gate).

Figure 9 shows the measured currents from VPB, VNB, and VDD while varying the body biasing of NMOS transistors.



Figure 9: The measured currents from VPB, VNB, and VDD while varying body voltage of NMOS transistors (VNB)

Around the diode turn-on voltage, the current from VNB increases exponentially because the intrinsic diode between the well connected to the body bias and transistor junction is turned on. After the diode is fully turned on, the current become linearly proportional to the voltage due to the well resistance.

The current from supply voltage also increase since the intermediate values at the transistor junctions due to the body bias result in slight turn-ons of both PMOS and NMOS transistors. The large current from VPB is interesting to notice. It is because the NPN intrinsic bipolar transistor formed by the NMOS transistor junction (N), PWELL (P), and NWELL (N) is turned on and draws current from the NWELL, therefore VPB.

Increasing the body voltage for the PMOS transistors above the diode turn-on voltage was not very successful since it often resulted in the latch-up phenomenon, which leads to the shorting between supply voltage and ground and as a result, burning the chip.

Figure 10 shows the simulated currents from VPB, VNB, and VDD while varying the body biasing when the temperature is  $25^{\circ}C$ . The TT corner (typical NMOS, typical PMOS) was assumed.

The simulation results look different from the measurements mainly because the spice model lacks



Figure 10: The simulated currents from VPB, VNB, and VDD while varying body voltages (Hspice simulation). 25 Celsius, TT corner.

the well resistors and intrinsic bipolar transistor formed between wells and junctions models. Since we focused on the static current and power, the lack of the intrinsic well capacitors could be ignored.

### 6 Summary

We verified that the ASIC part of ATC1 chip, the AHIP and RGBYCC kernel works without a problem even at 400 MHz.

For high-speed IO, high-quality bond wire and package with low parasitic inductance are necessary.

At a low temperature (room temperature), leakage current is not significant at TSMC 180 nm process even with large forward body biasing. Increasing body voltage above the diode turn-on voltage can be dangerous because it can lead to the infamous latch-up phenomenon.

# References

[1] Krste Asanovic, Kenneth Barr, James Beck, Brian Pharris, and Michael Zhang. ATCO Test Strategy. 2004.