Storage Performance: Local NVMe SSD vs 10GbE NAS
Benchmarking storage tiers for our homelab. Why databases need local NVMe paths while object storage thrives on the NAS.
When I first moved my PostgreSQL database to the NAS over NFS, I could not understand why queries that took 2ms locally were suddenly taking 40ms. I spent an entire weekend running fio benchmarks, tweaking NFS mount options, and comparing iSCSI versus NFS paths before realizing that the problem was not bandwidth but latency — the 0.8ms round-trip penalty on every random 4K read was compounding across thousands of B-tree lookups per query. That experience fundamentally changed how I think about storage placement in Kubernetes: it is not about how fast you can stream a large file, but about how many tiny random reads your storage can serve per second. This article distills those benchmarks and the workload placement strategy that emerged from them.
Introduction
In a homelab or on-premise Kubernetes cluster, storage is often the biggest bottleneck. We have two main storage tiers available:
- Local NVMe SSDs: Directly attached to the compute nodes (PCIe Gen4).
- Network Attached Storage (NAS): Accessed via 10GbE network (TrueNAS Scale over NFS/iSCSI).
The question is: Which workload goes where?
What We’ll Measure
We will use fio to benchmark both paths from within a Kubernetes Pod to simulate real-world conditions.
Architecture Overview
We use Longhorn as our Block Storage provider, but configured differently for each tier.
- Fast Path: Longhorn utilizing
hostPathon NVMe. Good for Databases. - Bulk Path: NFS mount backed by TrueNAS ZFS. Good for MinIO/Backups.
flowchart TD
subgraph Workloads
DB["PostgreSQL / Qdrant / FalkorDB"]
OBJ["MinIO Object Storage"]
BK["Backups & Snapshots"]
end
DB -->|"random 4K I/O"| NVME
OBJ -->|"sequential 1M I/O"| NAS
BK -->|"sequential writes"| NAS
subgraph "Fast Path — Local NVMe"
NVME["WD Black SN850X\n650K IOPS · 0.04ms · 2.5 GB/s"]
end
subgraph "Bulk Path — 10GbE NAS"
NAS["TrueNAS RAIDZ2 (40 TB)\n45K IOPS · 0.8ms · 980 MB/s"]
end
style NVME fill:#1a2744,stroke:#6366f1,color:#e2e8f0
style NAS fill:#1a2744,stroke:#f59e0b,color:#e2e8f0
[Longhorn Data Locality Settings]
— SUSE Rancher , 2024-03-15
Section 1: The Benchmarks
We ran fio with random read/write patterns (4k block size) to simulate database traffic, and sequential patterns (1M block size) for file streaming.
Local NVMe (WD Black SN850X)
fio --name=random-write --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=32
- IOPS: ~650,000
- Latency: 0.04ms
- Bandwidth: ~2.5 GB/s
10GbE NAS (NFS over TrueNAS)
- IOPS: ~45,000
- Latency: 0.8ms (Network overhead)
- Bandwidth: ~980 MB/s (Saturation of 10GbE line)
Section 2: Workload Placement Strategy
Based on these numbers, we split our infrastructure.
Databases (PostgreSQL, Qdrant, FalkorDB)
Placement: Local NVMe.
Why: Vector search and graph traversals are random-access heavy. High IOPS and low latency are critical for “snappy” searches.
Mechanism: We use Kubernetes LocalPersistentVolumes or Longhorn with strict node affinity to keep data close to the CPU.
Object Storage (MinIO) & Backups
Placement: NAS (NFS/iSCSI). Why: Documents (PDFs, Images) are read sequentially. A 20MB PDF loads in 0.02s over 10GbE, which is imperceptible to the user. Scale: The NAS offers 40TB of redundancy (RAIDZ2), which NVMe cannot match cost-effectively.
[Kubernetes Storage Best Practices] — Kubernetes Authors , 2024-06-01Section 3: Configuring Kubernetes
To implement this, we define two StorageClasses.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-nvme
provisioner: driver.longhorn.io
parameters:
dataLocality: best-effort
numberOfReplicas: "1" # Rely on app-level replication or backups
diskSelector: "nvme"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: bulk-nas
provisioner: nfs.csi.k8s.io
parameters:
server: 192.168.0.50
share: /mnt/tank/k8s
Conclusion
Don’t treat all storage as equal. By mapping workloads to the physical characteristics of the storage medium, we get the best of both worlds: extreme database speed and massive, cheap capacity for files.
The biggest takeaway from this benchmarking exercise was that raw throughput numbers tell only half the story. What actually matters is how your specific workload interacts with the storage path: random vs sequential, small vs large blocks, read-heavy vs write-heavy. I now run application-level latency benchmarks alongside fio whenever I change storage configurations, because a 10x improvement in fio numbers means nothing if the application’s access pattern does not benefit from it.
[Systems Performance: Enterprise and the Cloud] — Brendan Gregg , 2020-12-09Next Steps
- See how we Image Optimization helps keep storage usage low.
- Read about RAG Query Improvement where database speed matters.
- Dive into Longhorn Distributed Storage for the distributed block storage layer that powers the NVMe tier.