Storage Performance: Local NVMe SSD vs 10GbE NAS

When I first moved my PostgreSQL database to the NAS over NFS, I could not understand why queries that took 2ms locally were suddenly taking 40ms. I spent an entire weekend running fio benchmarks, tweaking NFS mount options, and comparing iSCSI versus NFS paths before realizing that the problem was not bandwidth but latency — the 0.8ms round-trip penalty on every random 4K read was compounding across thousands of B-tree lookups per query. That experience fundamentally changed how I think about storage placement in Kubernetes: it is not about how fast you can stream a large file, but about how many tiny random reads your storage can serve per second. This article distills those benchmarks and the workload placement strategy that emerged from them.

Introduction

In a homelab or on-premise Kubernetes cluster, storage is often the biggest bottleneck. We have two main storage tiers available:

Local NVMe SSDs: Directly attached to the compute nodes (PCIe Gen4).
Network Attached Storage (NAS): Accessed via 10GbE network (TrueNAS Scale over NFS/iSCSI).

The question is: Which workload goes where?

What We’ll Measure

We will use fio to benchmark both paths from within a Kubernetes Pod to simulate real-world conditions.

[fio - Flexible I/O Tester Documentation] — Jens Axboe , 2024-02-01

Architecture Overview

We use Longhorn as our Block Storage provider, but configured differently for each tier.

Fast Path: Longhorn utilizing hostPath on NVMe. Good for Databases.
Bulk Path: NFS mount backed by TrueNAS ZFS. Good for MinIO/Backups.

flowchart TD
    subgraph Workloads
        DB["PostgreSQL / Qdrant / FalkorDB"]
        OBJ["MinIO Object Storage"]
        BK["Backups & Snapshots"]
    end

    DB -->|"random 4K I/O"| NVME
    OBJ -->|"sequential 1M I/O"| NAS
    BK -->|"sequential writes"| NAS

    subgraph "Fast Path — Local NVMe"
        NVME["WD Black SN850X\n650K IOPS · 0.04ms · 2.5 GB/s"]
    end

    subgraph "Bulk Path — 10GbE NAS"
        NAS["TrueNAS RAIDZ2 (40 TB)\n45K IOPS · 0.8ms · 980 MB/s"]
    end

    style NVME fill:#1a2744,stroke:#6366f1,color:#e2e8f0
    style NAS fill:#1a2744,stroke:#f59e0b,color:#e2e8f0

[Longhorn Data Locality Settings] — SUSE Rancher , 2024-03-15

Section 1: The Benchmarks

We ran fio with random read/write patterns (4k block size) to simulate database traffic, and sequential patterns (1M block size) for file streaming.

Local NVMe (WD Black SN850X)

fio --name=random-write --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=32

IOPS: ~650,000
Latency: 0.04ms
Bandwidth: ~2.5 GB/s

10GbE NAS (NFS over TrueNAS)

IOPS: ~45,000
Latency: 0.8ms (Network overhead)
Bandwidth: ~980 MB/s (Saturation of 10GbE line)

[ZFS Performance Tuning Guide] — OpenZFS , 2024-01-15

Section 2: Workload Placement Strategy

Based on these numbers, we split our infrastructure.

Databases (PostgreSQL, Qdrant, FalkorDB)

Placement: Local NVMe. Why: Vector search and graph traversals are random-access heavy. High IOPS and low latency are critical for “snappy” searches. Mechanism: We use Kubernetes LocalPersistentVolumes or Longhorn with strict node affinity to keep data close to the CPU.

Object Storage (MinIO) & Backups

Placement: NAS (NFS/iSCSI). Why: Documents (PDFs, Images) are read sequentially. A 20MB PDF loads in 0.02s over 10GbE, which is imperceptible to the user. Scale: The NAS offers 40TB of redundancy (RAIDZ2), which NVMe cannot match cost-effectively.

[Kubernetes Storage Best Practices] — Kubernetes Authors , 2024-06-01

Section 3: Configuring Kubernetes

To implement this, we define two StorageClasses.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-nvme
provisioner: driver.longhorn.io
parameters:
  dataLocality: best-effort
  numberOfReplicas: "1" # Rely on app-level replication or backups
  diskSelector: "nvme"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: bulk-nas
provisioner: nfs.csi.k8s.io
parameters:
  server: 192.168.0.50
  share: /mnt/tank/k8s

Conclusion

Don’t treat all storage as equal. By mapping workloads to the physical characteristics of the storage medium, we get the best of both worlds: extreme database speed and massive, cheap capacity for files.

The biggest takeaway from this benchmarking exercise was that raw throughput numbers tell only half the story. What actually matters is how your specific workload interacts with the storage path: random vs sequential, small vs large blocks, read-heavy vs write-heavy. I now run application-level latency benchmarks alongside fio whenever I change storage configurations, because a 10x improvement in fio numbers means nothing if the application’s access pattern does not benefit from it.

[Systems Performance: Enterprise and the Cloud] — Brendan Gregg , 2020-12-09

Next Steps

See how we Image Optimization helps keep storage usage low.
Read about RAG Query Improvement where database speed matters.
Dive into Longhorn Distributed Storage for the distributed block storage layer that powers the NVMe tier.

Introduction

What We’ll Measure

Architecture Overview

Section 1: The Benchmarks

Local NVMe (WD Black SN850X)

10GbE NAS (NFS over TrueNAS)

Section 2: Workload Placement Strategy

Databases (PostgreSQL, Qdrant, FalkorDB)

Object Storage (MinIO) & Backups

Section 3: Configuring Kubernetes

Conclusion

Next Steps

Further Reading