

### Using a PCIe-Based Switch Module to Enhance Enterprise Storage Architecture

Chi-Lin Tom Memoright chilintom@memoright-usa.com

Flash Memory Summit 2015 Santa Clara, CA



### Legacy Storage Solution Architecture

......

2U Server – 16 Slots 12 HDD & 4 SSD



Flash Memory Summit 2015 Santa Clara, CA



#### 1 HDD = 6TB as Storage



## Drawbacks of Legacy Storage Solution

- HBA (Host Bus Adaptor) quickly becomes a bottleneck and contention point.
- Some HDD slots (typically 4) are reserved for SSD as cache and the storage space is reduced.





## Advantages of Contemporary Storage Solution

- Increase storage space
- Reduce latency
- Improve performance



- Switching is full duplex in parallel with SAS HBA traffic
- Affinity of each M.2 to one or many NUMA CPU cores
- 4 M.2 adds flexibility of resource aggregation
- Provide PCIe pass through to VMs
- No driver needed





# Why M.2 PCIe NVMe?

- Higher Bandwidth
  - SATA III SSD: 6Gb
  - Gen 2 x 2: 8Gb
  - Gen 3 x 2: ~16Gb
  - Gen 3 x 4: ~32Gb
- Low latency
- Native PCIe instead of AHCI Mode





#### AHCI vs. NVMe

|                                                     | AHCI                                                              | NVMe                                             |
|-----------------------------------------------------|-------------------------------------------------------------------|--------------------------------------------------|
| Maximum Queue Depth                                 | 1 command queue;<br>32 commands per queue                         | 65536 queues;<br>65536 commands per queue        |
| Uncacheable register accesses<br>(2000 cycles each) | 6 per non-queued command;<br>9 per queued command                 | 2 per command                                    |
| MSI-X<br>and interrupt steering                     | single interrupt;<br>no steering                                  | 2048 MSI-X interrupts                            |
| Parallelism<br>and multiple threads                 | requires synchronization lock<br>to issue a command               | no locking                                       |
| Efficiency<br>for 4 KB commands                     | command parameters require<br>two serialized host DRAM<br>fetches | gets command parameters<br>in one 64 Bytes fetch |

### Legacy Solution CPU Utilization



**Flash** Memory



150 IOPS/Lane





Storage CPU Usage

• Virtualization software often is starved for IOPs and PCIe pass-through can improve the CPU utilization from the average 30% to the desired 75%.





- Storage Appliance
- EDA
- NAS Gateway
- Others



### **Application: Storage Appliance**



Flash Memory Summit 2015 Santa Clara, CA



### Applications: Electronic Design Automation (current)





### Applications: Electronic Design Automation (new)





| Access<br>Protocols | NFS CIFS                                                   |  |
|---------------------|------------------------------------------------------------|--|
| Management          | Single Point of<br>Administration User Defined<br>Policies |  |
| Availability        | N-Way Clustering QoS                                       |  |
| Data<br>Services    | Compression Deduplication Replication Encryption           |  |
| Caching             | In-Switch Metadata DRAM Cache Fabric                       |  |
| SAN<br>Integration  | iSCSI FCoE AoE                                             |  |

Flash Memory Summit 2015 Santa Clara, CA



# Thank You!