On Wed, 2019-11-20 at 16:03 +0000, nufosmatic(a)nufosmatic.com wrote:
So, the engineers in our group think SPDK/DPDK/NVMe is the right
the performance storage component on the system we are designing. That's not
the same thing as evidence, or a convincing argument with some simple numbers
that management (and the prospect) can understand. I've now discovered the FIO
tool and the "plugin" for NVMe in SPDK and it looks like the right objective
performance measurement tool for the job. (I wish I'd found FIO for a post-
mortum of a rather disasterous storage solution we took way to long to
analyze. IOmeter finally showed up the problem, which was never solved).
So we would like to be able to walk management and the prospect through the
world before and the world after SPDK and be able to demonstrate features and
benefits using FIO.
Here's the walk:
1 [X] FIO => NVMEe native block device
2 [X] FIO => "plugin" SPDK NVMEe
3 [ ] FIO => "block device emulation" SPDK NVMEe
4 [ ] FIO => Host NVMEe-oF <=network=> Target NVMe-oF SPDK NVMe
1 - basic performance of any given NVMe device
2 - performance advantage of SPDK for NVMe device(s)
3 - non-network overhead of SPDK as a means to NVMe device(s)
4 - network-connected storage accelerated by NVMe
So I think I understand the cases 1 and 2. The 'X' means I think I've
done it. I think I understand the case 4 (waiting for the rest of the
networking hardware to show up).
I suspect that there is some way to present the SPDK NVMe device(s) as block
device locally so that I might be able to do case 3, but I just have not found
something that states plainly that this is supported or unsupported.
Do you mean presenting a device managed by SPDK through the Linux kernel block
stack? Like under /dev? This is possible using things like NBD or the Linux
kernel NVMe-oF initiator in loopback, but there are significant performance
But you may also mean presenting an SPDK block device directly to fio. SPDK has
its own block storage stack (lib/bdev) with a "module" system where different
types of devices can be plugged in. The Linux kernel would call a "module" a
device driver. One of the modules is "NVMe" but there's a dozen or more of
that can talk to all sorts of things (iSCSI, libaio, Ceph RBD, PMEM, etc. etc.).
SPDK has two fio plugins. One (the one in examples/nvme/fio_plugin) talks
directly to the SPDK NVMe driver - i.e. the very lowest part of the SPDK stack.
I think this is what you're doing with #2 above. The other plugin (the one in
examples/bdev/fio_plugin) talks to the SPDK block layer. You can use that plugin
to talk to any SPDK block device, including an NVMe device. The reason we have
both is sometimes we want to benchmark just the lower layer NVMe driver, and
sometimes we want to benchmark the whole block stack.
Note that fio itself has some performance issues. You will not be able to
measure the full performance of SPDK via fio - the tool itself is just too slow.
We provide plugins primarily because it is the industry standard tool, so it's a
quick way for people to get up and running. We recommend you use the SPDK 'perf'
tool in examples/nvme/perf to benchmark the SPDK NVMe driver instead. It has
fewer features than fio, but has much lower overhead.
Also note that using the NVMe-oF initiator is exactly the same as using the
local NVMe driver in these benchmark tools. You simply configure it with a
different transport type (tcp or rdma instead of pcie) and transport address (an
IP/port instead of a PCI bus:device:function).
5 - Soft-RoCE-based NVMe-oF implementation to work through software issues
when hardware is not available
In my shop the RDMA stuff is rare as hens teeth. Or getting a system with NVMe
and RDMA in the same chassis is rarer. I have a bucketload of software issues
to understand that don't involve the hardware. Has anybody danced Soft-RoCE
around with SPDK and NVMe-oF? (It's a wonderful way to get your head wrapped
around RDMA programming when you just can't get the right hardware to play
We use Soft-RoCE all the time and it works well enough. It is not representative
of the way RoCE actually behaves, especially during failure scenarios, and the
performance is quite bad. But it certainly is fine for basic testing.
However, as Andrey suggested, I've mostly switched to using the tcp transport
for NVMe-oF local development and testing instead.
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org