With this message I wanted to update SPDK community on state of VPP socket abstraction as of SPDK 19.07 release.
At this time there does not seem to be a clear efficiency improvements with VPP. There is no further work planned on SPDK and VPP integration.
As some of you may remember, SPDK 18.04 release introduced support for alternative socket types. Along with that release, Vector Packet Processing (VPP)<https://wiki.fd.io/view/VPP> 18.01 was integrated with SPDK, by expanding socket abstraction to use VPP Communications Library (VCL). TCP/IP stack in VPP<https://wiki.fd.io/view/VPP/HostStack> was in early stages back then and has seen improvements throughout the last year.
To better use VPP capabilities, following fruitful collaboration with VPP team, in SPDK 19.07, this implementation was changed from VCL to VPP Session API from VPP 19.04.2.
VPP socket abstraction has met some challenges due to inherent design of both projects, in particular related to running separate processes and memory copies.
Seeing improvements from original implementation was encouraging, yet measuring against posix socket abstraction (taking into consideration entire system, i.e. both processes), results are comparable. In other words, at this time there does not seem to be a clear benefit of either socket abstraction from standpoint of CPU efficiency or IOPS.
With this message I just wanted to update SPDK community on state of socket abstraction layers as of SPDK 19.07 release. Each SPDK release always brings improvements to the abstraction and its implementations, with exciting work on more efficient use of kernel TCP stack - changes in SPDK 19.10 and SPDK 20.01.
However there is no active involvement at this point around VPP implementation of socket abstraction in SPDK. Contributions in this area are always welcome. In case you're interested in implementing further enhancements of VPP and SPDK integration feel free to reply, or to use one of the many SPDK community communications channels<https://spdk.io/community/>.
Using spdk 19.07, NVME target over TCP, in multi process environment
The zone name that is reserved for nvme target is not unique (i.e.
1. Does it mean that two nvme target processes running as secondary can
not run simultaneously ?
2. In case of only one secondary as nvme target , if the nvme target
process exits unexpectedly, it looks like it will not be able to create
memory zone, because it already exist from a previous run.
Am I right ? and if yes, is there any simple workaround ?
I'm trying to create mock spdk_nvme_ctrlr structs for use in my unit tests, and can see this is done in spdk/test/unit/lib/nvme/nvme.c/nvme_ut.c by importing nvme/nvme.c which indirectly gives access to the underlying definition of spdk_nvme_ctrlr in lib/nvme/nvme_internal.h. My unit test source is outside of the spdk unit test scaffolding and so I'm getting the expected "incomplete type" messages when I try to allocate memory with the size of the type. I haven't been able to figure out a way of importing nvme_internal.h to get access to the underlying type. any suggestions?
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Should it make sense to ./configure --with-rdma --enable-lto --enable-debug ? I'm trying to get debug logging along with some of my testing, but
when I combine these two options, I get the following make errors:
/tmp/cczVeafv.ltrans0.ltrans.o: In function `spdk_bdev_io_get_scsi_status':
<artificial>:(.text+0x9bbb): undefined reference to `spdk_scsi_nvme_translate'
collect2: error: ld returned 1 exit status
spdk/mk/spdk.unittest.mk:71: recipe for target 'bdev_ut' failed
make: *** [bdev_ut] Error 1
spdk/mk/spdk.subdirs.mk:44: recipe for target 'bdev.c' failed
make: *** [bdev.c] Error 2
spdk/mk/spdk.subdirs.mk:44: recipe for target 'mt' failed
make: *** [mt] Error 2
spdk/mk/spdk.subdirs.mk:44: recipe for target 'bdev' failed
make: *** [bdev] Error 2
spdk/mk/spdk.subdirs.mk:44: recipe for target 'lib' failed
make: *** [lib] Error 2
spdk/mk/spdk.subdirs.mk:44: recipe for target 'unit' failed
make: *** [unit] Error 2
spdk/mk/spdk.subdirs.mk:44: recipe for target 'test' failed
make: *** [test] Error 2
If I leave out --enable-debug, the make succeeds.
SPDK version: v19.04.1
I tried to use spdk_pci_device_map_bar to get the mapped virtual address of a BAR, but the obtained value is 0. Thy physical address and size are correct.
If I use uio_pci_generic instead, the same code can run without problem. Is this a bug?
So, the engineers in our group think SPDK/DPDK/NVMe is the right solution for the performance storage component on the system we are designing. That's not the same thing as evidence, or a convincing argument with some simple numbers that management (and the prospect) can understand. I've now discovered the FIO tool and the "plugin" for NVMe in SPDK and it looks like the right objective performance measurement tool for the job. (I wish I'd found FIO for a post-mortum of a rather disasterous storage solution we took way to long to analyze. IOmeter finally showed up the problem, which was never solved).
So we would like to be able to walk management and the prospect through the world before and the world after SPDK and be able to demonstrate features and benefits using FIO.
Here's the walk:
1 [X] FIO => NVMEe native block device
2 [X] FIO => "plugin" SPDK NVMEe
3 [ ] FIO => "block device emulation" SPDK NVMEe
4 [ ] FIO => Host NVMEe-oF <=network=> Target NVMe-oF SPDK NVMe
1 - basic performance of any given NVMe device
2 - performance advantage of SPDK for NVMe device(s)
3 - non-network overhead of SPDK as a means to NVMe device(s)
4 - network-connected storage accelerated by NVMe
So I think I understand the cases 1 and 2. The 'X' means I think I've actually done it. I think I understand the case 4 (waiting for the rest of the networking hardware to show up).
I suspect that there is some way to present the SPDK NVMe device(s) as block device locally so that I might be able to do case 3, but I just have not found something that states plainly that this is supported or unsupported.
5 - Soft-RoCE-based NVMe-oF implementation to work through software issues when hardware is not available
In my shop the RDMA stuff is rare as hens teeth. Or getting a system with NVMe and RDMA in the same chassis is rarer. I have a bucketload of software issues to understand that don't involve the hardware. Has anybody danced Soft-RoCE around with SPDK and NVMe-oF? (It's a wonderful way to get your head wrapped around RDMA programming when you just can't get the right hardware to play with.)
I have a need to be able to run SPDK in a performance mode and also check for "health" especially package temperatures as the device is running.
I was very frustrated that I could not run smartctl concurrently with SPDK to get device information on NVMe devices.Building a performance system without being able to determine health is a non-starter.
I got my mind right this morning and I thought I'd share:
* In "examples/nvme", there exists "identify" which provides much of the "health" information one would usually get from "smartctl".
* If you naively attempt to run "identify" concurrently with "perf", you will discover that you can run one or the other, but not both, complaining about "claiming" a device.
* If you look at the command options, you will find "shared memory ID", typically "-i ID", which indicates an shared memory ID that multiple processes can access concurrently. You can now run "perf -i ID ..." and then run "identify -i ID ..." and, for instance, watch the temperature on the packages rise over time.
* If you look at the code for "nvme/hello_world", you will find that spdk_env_opts has a field "shm_id". This is apparently what gets populated from the above "-i ID" options on the command line of these other examples. If you fix up "hello_world" to set shm_id = -1 (default - no shared memory), then capture and option and update this field to the ID value, you will be able to get the "hello_world" to work along with "perf" and/or "identify".
* hello_world could be a place to make a simpler temperature sensor (using _HEALTH_ message as the data source), or to include health sensing in a larger application.
* This process still gets blivits in the involved processes. I haven't figured this out [yet].
I am encountering some issues with multi process and nvme. Bellow is the
Is anyone experience this kind of issue ? What I am doing wrong?
* 4 nvme pci devices (addresses - 0000:00:0b.0, 0000:00:0c.0, 0000:00:0d.0,
* two perf processes , one as primary and one as secondary
first process to run (as primary), probe and access all 4 available nvme
./perf -q 1 -o 4096 -w randread -c 0x1 -t 360 -i 1
Second process to run (secondary), probe only two of the 4 devices:
./perf -r 'trtype:PCIe traddr:0000:00:0b.0' -r 'trtype:PCIe
traddr:0000:00:0c.0' -q 8 -o 131072 -w write -c 0x10 -t 60 -i 1
The secondary crashes on segmentation fault
Scenarios that do work:
* When running the secondary to probe all devices, it works fine
i.e. ./perf -r -q 8 -o 131072 -w write -c 0x10 -t 60 -i 1
* when running only one processes with pci device list, works fine as well
(i.e. ./perf -r 'trtype:PCIe traddr:0000:00:0b.0' -r 'trtype:PCIe
traddr:0000:00:0c.0' -q 8 -o 131072 -w write -c 0x10 -t 60 -i 1)
I've got an open design question that I wanted to run by the community regarding
the NVMe-oF target. There are a few primitives that the library currently
1) NVMe-oF transports are a networking transport abstraction. These have memory
and request object pools and can create and destroy connections.
2) NVMe-oF poll groups are sets of NVMe-oF queue pairs (that aren't necessarily
related to one another). A single poll group can aggregate connections from
multiple different transports.
3) NVMe-oF subsystems are sets of related namespaces. A subsystem is effectively
an access control list.
4) NVMe-oF controllers are network sessions. It's a set of NVMe-oF queue pairs
(connections) that are all connected to the same subsystem and accessing the
6) NVMe-oF targets are a set of NVMe-oF subsystems, controllers, and poll
groups. The target object really defines the NVMe-oF discovery service.
Historically, the SPDK NVMe-oF target application created a single, global NVMe-
oF target object. But we're trying to generalize the underlying nvmf library to
support multiple targets so that a single application can support multiple
discovery services for more complex use cases.
My question is around how to map transports (#1) to targets (#6). There are two
paths we could go down. I'll use the RDMA transport as an example.
A) Each target gets its own transport. That means there would be two RDMA
transports for two targets, and each transport would allocate it's own pool of
B) transports can be shared across targets. Two different targets can share a
single RDMA transport. They can't share connections or listen on the same
addresses, of course, but they share the same request/buffer pools internally.
Does anyone have a strong opinion on what the right choice is here?
Does anyone out there have any experience (any at all) with SPDK & docker or Kata or anything. I’m starting to put our story together here and am guessing someone out there has done something already that can help me get started ☺
Feel free to email me privately or slack me, I’ll consolidate everything and start a doc page as soon as there’s enough content.
PS: There’s still time to register to the Developer Meetup, see https://spdk.io/news/2019/09/06/dev_meetup/