#### Evaluating MTJ Benefits and Utilizing Context-Switched Hardware for Designing Magnetologic Circuits

Michael Hall Advisor: Dr. Roger Chamberlain Co-advisor: Dr. Viktor Gruev Washington University in St. Louis

May 23, 2013

#### Magnetic Tunnel Junction (MTJ) Uses



#### Outline

#### Introduction & Motivation

**Clocking Research** 

**Context-Switching Research** 

Conclusion

### Magnetic Tunnel Junction (MTJ)



- □ Thin-film magnetic device
- □ Set via field or current
- Read via resistance output

#### Modern clock distribution

- Clock distribution tree (on-chip)
  - Predominate way to distribute clock
  - Use tree-like structure and clock buffers to balance signal propagation to every flip-flop in synchronous logic circuits



- <u>Power consumption</u>: > 25% on modern processors
  [Ranganathan 2007]
- <u>Clock skew</u>: 3.8% in CELL processor at 3.2 GHz [Ranganathan 2007]
- Area: Also significant

### Global magnetic clocking



#### □ I propose to:

- Investigate global magnetic field with MTJs for clock distribution
  - Similar to optical clock distribution in free space described by [Goodman et al. 1984].
- Design a resistance-to-voltage read circuit for sensing MTJ resistance

# Logic



- □ Logic design using MTJs (also called "magnetologic")
- □ No one way to design
- □ Latching of logic output
  - Pipeline nature of logic
- □ Want to exploit latching property

#### C-slow transformation [Leiserson and Saxe 1991]



- Replace every register with C registers.
- Improve clock frequency if registers evenly distributed (accomplished via retiming).
- C pipeline stages can process C interleaved data streams.
- C-slow transformation allows circuit to be context-switched.

#### Benefits of C-slow

- 1. Concurrently use hardware for multiple streams
- 2. Increase in clock frequency
- 3. Increase in total throughput when resource limited

#### Related work

- $\Box$  C-slow
  - Weaver et al. 2003 applied C-slow to achieve speedup on specialized hardware.

#### Context-switched hardware

- I propose to investigate context-switching in hardware.
- □ Types of context-switching:
  - Fine-grain C-slow
    - □ Ex. Hyper-threading on a processor.
  - Coarse-grain Store contexts in secondary memory
    - Ex. Operating system context-switch of a running program.
- Applicable to FPGA, ASIC, and magnetologic technologies.

#### Outline

#### Introduction & Motivation

**Clocking Research** 

**Context-Switching Research** 

Conclusion

# Clocking research

- □ Fabricated a test chip
  - Test on-chip resistance-to-voltage read circuit
  - Test global clocking circuits
- Will use read circuit results in a model for a standard cell design to emulate magnetic clocking.

## Clocking research questions

1. What is the power, speed, area, and jitter of a <u>MTJ read circuit?</u>

2. What is the tradeoff in power, speed, and area between magnetic global clocking and onchip clock distribution?

#### Related work: read circuit

- □ Current-conveyor with current comparator
  - Au et al., "A novel current-mode sensing scheme for magnetic tunnel junction MRAM," *IEEE Transactions on Magnetics*, vol. 40, no. 2, pp. 483–488, Mar. 2004.

# Proposed: Design, layout, and simulation of MTJ read circuit across process technologies

- □ 3M2P 0.5 µm and 5M1P 0.18 µm process technologies
- □ Characterized this read circuit in [Hall et al. 2012].





# Proposed: Design, layout, simulation, and fabrication of test chip

#### MTJ-CMOS Wirebonded System



#### **CMOS Test Chip Diagram**





### Proposed: Testing the chip

- Build testing infrastructure
  - Build PCB board
  - Write FPGA firmware
  - Write PC software
  - Troubleshoot test setup
- □ Stimulate and test fabricated chip
  - Measure power and speed MTJ read circuit.
  - Test global clocking circuits.

#### **PCB** board layout



Proposed: Compare tradeoffs between on-chip clock distribution and magnetic clocking

Measure power, area, and speed of standard cell designs.

- □ For magnetic clocking:
  - Create custom standard cell of MTJ read circuit.
  - Replace top-level clock buffers with custom cell and rip out top-level clock routing.

#### Status of work

- □ Noise analysis of current conveyor circuit [Hall et al. 2011].
- □ Simulations of complete MTJ read circuit design [Hall et al. 2012].
- □ Fabricated test chip and PCB board.
- **TBD:** Populate PCB board, write FPGA firmware and PC software.
- □ TBD: Test global clocking in chip and measure MTJ read circuit.
- □ TBD: Compare tradeoffs between on-chip clock distribution and magnetic clocking.

#### Outline

#### Introduction & Motivation

**Clocking Research** 

Context-Switching Research

Conclusion

#### Context-switching research questions

How do we build effective context-switchable hardware?

- 1. <u>What are suitable models of the performance and resource</u> <u>utilization of context-switchable hardware?</u>
- 2. <u>What are good guidelines for choosing between designs?</u>
- 3. <u>What are optimal schedules for context-switching including</u> when there is a tradeoff between throughput and latency?

## Property of magnetologic

- Red box is a magnetologic gate.
- □ It has a latching property at its output.



- Context-switching can be used for magnetologic but is not dependent on it.
- It can also be used in FPGAs and ASICs.



Fig. 4. Magneto-logic gate (a) AND, (b) OR, (c) NOR, (d) NAND, (e) XOR, and (f) XNOR.

S. Lee, S. Choa, S. Lee, and H. Shin, "Magneto-logic device based on a single-layer magnetic tunnel junction," *IEEE Transactions on Electron Devices*, vol. 54, no. 8, pp. 2040–2044, Aug. 2007.

#### Context-switching in hardware



Dashed rectangles are gates in magnetologic and combinational logic (CL) followed by a memory element in CMOS.

#### Workload Characteristics



# Design Space



#### Interfaces





#### Schedules

- □ Fixed schedule
- □ Fixed schedule with secondary memory
- Dynamic schedule with secondary memory
  - Data availability
  - Oldest data available first
  - Round robin + data availability

### Proposed: Workload applications

(b)

**1. Synthetic Cosine Feedback Function** 



**3.** Future real-world application to be developed at Velocidata this summer

2. AES Encryption Cipher (CBC Block Mode)



#### Proposed: Exploration of design space

□ <u>Interfaces</u>: parallel, single tagged

□ <u>Schedules</u>: fixed, dynamic

□ <u>Implementation</u>: fine-grain, coarse-grain

# Proposed: Model performance and resource utilization of context-switched hardware

- □ Why do this: To guide the design process
- □ <u>Inputs</u>: workload, design, technology
- Outputs: ex. area, power, clock frequency, stream latency, total throughput, utilization of pipeline, throughput-area efficiency, limit on C, cost of a context-switch, etc.

### Total throughput model

| Parameter        | Description                             |
|------------------|-----------------------------------------|
| t <sub>CL</sub>  | Propagation time of combinational logic |
| WL               | Workload params (e.g. Nr = # of rounds) |
| С                | # of pipeline stages                    |
| σ                | Stdev. of delay between pipeline stages |
| T <sub>TOT</sub> | Predicted total throughput              |

#### □ C-slow example:

$$T_{TOT} = \frac{1}{\text{Clock Period}}$$
  
Clock Period =  $t_{CL} (\text{WL}) \cdot \frac{1}{C} + \sigma (\text{WL}) \cdot \sqrt{C - 1}$ 

#### Curve fit of total throughput for AES

#### Workload Application: AES Encryption Cipher (CBC block mode)



Curve fit equation:

$$T_{TOT} = 1 / \left[ (5.772 \, {}^{ns}/\!\!\! \operatorname{rnd} \cdot Nr) \cdot \frac{1}{C} + (1.672 \, \operatorname{ns} - 53.14 \, {}^{ps}/\!\!\! \operatorname{rnd} \cdot Nr) \cdot \sqrt{C - 1} \right]$$

#### Status of work

- Developed two workload applications: Synthetic Cosine, AES
- Developed parallel interface, C-slow for both workloads.
- □ Completed a case study comparing replication vs. parallel interface.
- **TBD:** Further development of context-switched hardware.
- **TBD:** Develop models of performance and resource utilization.
- **TBD**: Determine set of guidelines for design selection.
- **TBD:** Write tool to generate context-switch hardware.

#### Outline

#### Introduction & Motivation

**Clocking Research** 

**Context-Switching Research** 

Conclusion

### Conclusion

- □ Will investigate magnetic global clocking
  - Develop a resistance-to-voltage read circuit.
  - Case study to evaluate potential benefit (power, area, clock skew) of magnetic global clocking
- □ Will investigate context-switched hardware
  - Model performance and resource utilization and develop set of guidelines for design selection
  - Build a tool to automatically context-switch hardware

## Bibliography

- C. Lin, *et al.*, "45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell," in 2009 IEEE International Electron Devices Meeting (IEDM). IEEE, 2009, pp. 1–4.
- N. Ranganathan and N. Jouppi, "Evaluating the potential of future on-chip clock distribution using optical interconnects," *Hewlett-Packard Development Company, Tech. Rep. HPL-2007-163, Oct*, 2007.
- J. Goodman, F. Leonberger, S.-Y. Kung, and R. Athale, "Optical interconnections for VLSI systems," *Proceedings* of the IEEE, vol. 72, no. 7, pp. 850–866, Jul. 1984.
- C. Leiserson and J. Saxe, "Retiming synchronous circuitry," *Algorithmica*, vol. 6, pp. 5–35, Jun. 1991, 10.1007/BF01759032. [Online]. Available: <u>http://dx.doi.org/10.1007/BF01759032</u>
- N. Weaver et al., "Post-placement C-slow retiming for the xilinx virtex FPGA," in *Proceedings of the 2003* ACM/SIGDA 11<sup>th</sup> Int. Symp. on FPGAs, ser. FPGA '03. New York, NY, USA: ACM, 2003, pp. 185–194.
- M. J. Hall, V. Gruev, and R. D. Chamberlain, "Noise analysis of a current-mode read circuit for sensing magnetic tunnel junction resistance," in *2011 IEEE International Symposium on Circuits and Systems (ISCAS)*, May 2011.
- E. K. S. Au, W.-H. Ki, W. H. Mow, S. T. Hung, and C. Y. Wong, "A novel current-mode sensing scheme for magnetic tunnel junction MRAM," *IEEE Transactions on Magnetics*, vol. 40, no. 2, pp. 483–488, Mar. 2004.
- M. J. Hall, V. Gruev, and R. D. Chamberlain, "Performance of a resistance-to-voltage read circuit for sensing magnetic tunnel junctions," in 2012 IEEE 55th Int. Midwest Symp. on Circuits and Syst. (MWSCAS), Aug. 2012, pp. 639–642. [Online]. Available: <u>http://dx.doi.org/10.1109/MWSCAS.2012.6292101</u>
- □ S. Lee, S. Choa, S. Lee, and H. Shin, "Magneto-logic device based on a single-layer magnetic tunnel junction," *IEEE Transactions on Electron Devices*, vol. 54, no. 8, pp. 2040–2044, Aug. 2007.