

## Could We Make SSDs Self-Healing?

### **Tong Zhang**

### Electrical, Computer and Systems Engineering Department Rensselaer Polytechnic Institute Google/Bing: "tong rpi"

#### Introduction and Motivation **Flash** Memory SUMMIT NAND Flash Hot Topic AN. WAN Application Server Mail Database Server Server Registers & SAN Cache Hard disk-CPU Main memory RAID Tape **Optical Storage** 100nm IMFT Samsung 🔺 Toshiba • Hynix Analyst estimate Bit cost reduction 10nm 2007 2008 2009 2010 2011 2012 2013 2014



## Introduction and Motivation



#### In-Depth Knowledge of Storage Media **Flash** Memory SUMMIT Hard disk drive **Controller &** Controller & Х W/R channel W/R channel Non-linear distortions Device characteristics Bit response Media defects Media noise Head noise Reader Shingled Writer **Field Generating** Reader Main Pole Reader Laser Layer Coil Media Media Media **Shingled** Progressive Write Field Microwave MAMR **BPM** HAMR TDMR Near Field Write Scan 1 bit=1 Island Head Magneti-Heating ē → H<sub>ac</sub> Motion zation coercive Cooling Recording Flash Memor 2-Dimensional Read Ambient Temp T (K) Precessional Reverse Santa Clara, CA

4













## Self-Healing SSD?

Explicitly leverage this device wear-out recovery phenomenon in FTL

- Re-think of how to utilize existing over-provisioning?
- > Keep track of the history of environment temperature
- ☐ Intentionally operate SSDs under higher environment temperature







□ Impact of data backup on system

performance



## **Thermal Simulation Setup**

Heat dissipation path HotSpot thermal modeling → Encapsulation Heating die-→ Silicon substrate → Adhesive layer → Bonding wire 3D chip structure → Metal ILD Data dies Active silicon Thermal interface material →Interposer → Solder ball ► PCB Thermal conductivity Thickness Layer Encapsulation 0.453 W/(mk) 1.0 mm Silicon substrate 100 W/(mK)50 µm Adhesive material 4 W/(mk) $4 \ \mu m$ Metal ILD 200 W/(mk)  $8 \ \mu m$ Active silicon 100 W/(mk) $2 \mu m$ 3D chip setup Thermal interface material 4 W/(mk) $20 \ \mu m$ Interposer 2 W/(mk)0.4 mm 0.94 mm Solder balls 16.7 W/(mk) PCB 3 W/(mk)2 mm





- □ 1.5W to 5.1W power consumption when temperature changes from 110 to 250C
- □ Choose 200°C as the target heating temperature
- ☐ ~35 minutes for 80% interface state traps to recover



# P/E Cycling Endurance Improvement

- Allowable worst-case memory read raw BER of 2.04e-3
- □ 10-year retention limit
- Include trap recovery under normal temperature (45° C)
- □ Self-healing trigger BER 1.50e-3
- □ Three times longer cooling time



P/E cycling endurance:  $3000 \rightarrow 17400$ 



# Data Backup During Chip Self-Heating







- DiskSim simulator with SSD model patch
- □ 2 dies/chip, 8-bit I/O bus and a number of common control bus
- □ 2 planes/die, 2048 blocks/plane, 64 pages/block, 8 sectors/page, 512bytes/sector
- □ 2 channels, 17 flash chips/channel including one backup chip
- Backup chip on each channel only backups the data of the flash chips on the same channel
- □ ONFI 2.0, 133MB/s, read access time 50µs, program time 600µs







Continuous technology scaling demands true device-aware SSD system design

☐ How to exploit memory cell wear-out recovery?

- Explicitly leverage this wear-out recovery phenomenon in FTL
- > A more aggressive scenario: self-heating NAND flash memory chips
  - SSD controller scheduling and data backup strategy
  - Simulation based on detailed thermal, flash memory cell, and SSD system modeling
- Comprehensive cross-layer optimization: an open question