I hope you are the right person to contact about this, please point me
to the right direction otherwise. I am also Cc:ing the SPDK and
qemu-devel mailing lists, to solicit community feedback.
As part of my internship at Arrikto, I have spent the last few months
working on the SPDK vhost target application. I was triggered by the
“VirtioVhostUser” feature you proposed for QEMU
[https://wiki.qemu.org/Features/VirtioVhostUser] and made my end goal to
have an end-to-end system running, where a slave VM offers storage to a
master VM over vhost-user, and exposes an underlying SCSI block device
underneath. My current approach is to use virtio-scsi-based storage
inside the slave VM.
I see that you have managed to move the vhost-user backend inside a VM
over a virtio-vhost-user transport. I have experimented with running the
SPDK vhost app over vhost-user, but have run with quite a few problems
with the virtio-pci driver. Apologies in advance for the rather lengthy
email, I would definitely value any short-term hints you may have, as
well as any longer-term feedback you may offer on my general direction.
My current state is:
I started with your DPDK code at
https://github.com/stefanha/dpdk/tree/virtio-vhost-user, and read about
your effort to integrate the DPDK vhost-scsi application with
My initial approach was to replicate your work, but with the SPDK vhost
library running over virtio-vhost-user. I have pushed all of my code in
the following repository, it is still a WIP and I really need to tidy up
Hacks I had to do:
- I use the modified script usertools/dpdk-devbind.py found in your DPDK
repository here: https://github.com/stefanha/dpdk to bind the
virtio-vhost-user device to the vfio-pci kernel driver. The SPDK setup
script in scripts/setup.sh does not handle unclassified devices like
the virtio-vhost-user device. I plan to fix this later.
- I pass the PCI address of the virtio-vhost-user device to the vhost
library, by repurposing the existing -S option; it no longer refers to
the UNIX socket, as in the case of the UNIX transport. This means the
virtio-vhost-user transport is hardcoded and not configurable by the
user. I plan to fix this later.
- I copied your code that implements the virtio-vhost-user transport and
made the necessary changes to abstract the transport implementation.
I also copied the virtio-pci code from DPDK rte_vhost into the SPDK
vhost library, so the virtio-vhost-user driver could use it. I saw
this is what you did as a quick hack to make the DPDK vhost-scsi
application handle the virtio-vhost-user device.
Having done that, I tried to demo my integration end-to-end, and
everything worked fine with a Malloc block device, but things broke
when I switched to a virtio-scsi block device inside the slave. My
attempts to call construct_vhost_scsi_controller failed with an I/O
error. Here is the log:
-- cut here --
$ export VVU_DEVICE="0000:00:06.0"
$ sudo modprobe vfio enable_unsafe_noiommu_mode=1
$ sudo modprobe vfio-pci
$ sudo ./dpdk-devbind.py -b vfio-pci $VVU_DEVICE
$ cd spdk
$ sudo scripts/setup.sh
Active mountpoints on /dev/vda, so not binding PCI dev 0000:00:04.0
0000:00:05.0 (1af4 1004): virtio-pci -> vfio-pci
$ sudo app/vhost/vhost -S "$VVU_DEVICE" -m 0x3 &
$ Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...
[ DPDK EAL parameters: vhost -c 0x3 -m 1024 --file-prefix=spdk_pid3918 ]
EAL: Multi-process socket /var/run/.spdk_pid3918_unix
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 1af4:1017 virtio_vhost_user
EAL: using IOMMU type 8 (No-IOMMU)
EAL: Ignore mapping IO port bar(0)
VIRTIO_PCI_CONFIG: found modern virtio pci device.
VIRTIO_PCI_CONFIG: modern virtio pci detected.
VHOST_CONFIG: Added virtio-vhost-user device at 0000:00:06.0
$ sudo scripts/rpc.py construct_virtio_pci_scsi_bdev 0000:00:05.0 VirtioScsi0
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 1af4:1004 spdk_virtio
EAL: Ignore mapping IO port bar(0)
$ sudo scripts/rpc.py construct_vhost_scsi_controller --cpumask 0x1 vhost.0
VHOST_CONFIG: BAR 2 not availabled
Got JSON-RPC error response
"message": "Input/output error",
-- cut here --
This was really painful to debug. I managed to find the cause yesterday,
I had bumped into this DPDK bug:
and I worked around it, essentially by short-circuiting the point where
the DPDK runtime rescans the PCI bus and corrupts the
dev->mem_resource field for the already-mapped-in-userspace
virtio-vhost-user PCI device. I just commented out this line:
This seems to be a good enough workaround for now. I’m not sure this bug
has been fixed, I will comment on the DPDK bugzilla.
But, now, I have really hit a roadblock. I get a segfault, I run the
exact same commands as shown above, and end up with this backtrace:
-- cut here --
#0 0x000000000046ae42 in spdk_bdev_get_io (channel=0x30) at bdev.c:920
#1 0x000000000046c985 in spdk_bdev_readv_blocks (desc=0x93f8a0, ch=0x0,
iov=0x7ffff2fb7c88, iovcnt=1, offset_blocks=0, num_blocks=8,
cb=0x453e1a <spdk_bdev_scsi_task_complete_cmd>, cb_arg=0x7ffff2fb7bc0) at bdev.c:1696
#2 0x000000000046c911 in spdk_bdev_readv (desc=0x93f8a0, ch=0x0, iov=0x7ffff2fb7c88,
iovcnt=1, offset=0, nbytes=4096, cb=0x453e1a <spdk_bdev_scsi_task_complete_cmd>,
cb_arg=0x7ffff2fb7bc0) at bdev.c:1680
#3 0x0000000000453fe2 in spdk_bdev_scsi_read (bdev=0x941c80, bdev_desc=0x93f8a0,
bdev_ch=0x0, task=0x7ffff2fb7bc0, lba=0, len=8) at scsi_bdev.c:1317
#4 0x000000000045462e in spdk_bdev_scsi_readwrite (task=0x7ffff2fb7bc0, lba=0,
xfer_len=8, is_read=true) at scsi_bdev.c:1477
#5 0x0000000000454c95 in spdk_bdev_scsi_process_block (task=0x7ffff2fb7bc0)
#6 0x00000000004559ce in spdk_bdev_scsi_execute (task=0x7ffff2fb7bc0)
#7 0x00000000004512e4 in spdk_scsi_lun_execute_task (lun=0x93f830, task=0x7ffff2fb7bc0)
#8 0x0000000000450a87 in spdk_scsi_dev_queue_task (dev=0x713c80 <g_devs>,
task=0x7ffff2fb7bc0) at dev.c:264
#9 0x000000000045ae48 in task_submit (task=0x7ffff2fb7bc0) at vhost_scsi.c:268
#10 0x000000000045c2b8 in process_requestq (svdev=0x7ffff31d9dc0, vq=0x7ffff31d9f40)
#11 0x000000000045c4ad in vdev_worker (arg=0x7ffff31d9dc0) at vhost_scsi.c:685
#12 0x00000000004797f2 in _spdk_reactor_run (arg=0x944540) at reactor.c:471
#13 0x0000000000479dad in spdk_reactors_start () at reactor.c:633
#14 0x00000000004783b1 in spdk_app_start (opts=0x7fffffffe390,
start_fn=0x404df8 <vhost_started>, arg1=0x0, arg2=0x0) at app.c:570
#15 0x0000000000404ec0 in main (argc=7, argv=0x7fffffffe4f8) at vhost.c:115
-- cut here --
I have not yet been able to debug this, it’s most probably my bug, but I
am wondering whether there could be a conflict between the two distinct
virtio drivers: (1) the pre-existing one in the SPDK virtio library
under lib/virtio/, and (2) the one I copied into lib/vhost/rte_vhost/ as
part of the vhost library.
I understand that even if I make it work for now, this cannot be a
long-term solution. I would like to re-use the pre-existing virtio-pci
code from the virtio library to support virtio-vhost-user.
Do you see any potential problems in this? Did you change the virtio
code that you placed inside rte_vhost? It seems there are subtle
differences between the two codebases.
These are my short-term issues. On the longer term, I’d be happy to
contribute to VirtioVhostUser development any way I can. I have seen
some TODOs in your QEMU code here:
and I would like to contribute, but it’s not obvious to me what
progress you’ve made since.
As an example, I’d love to explore the possibility of adding support for
interrupt-driven vhost-user backends over the virtio-vhost-user
- I will follow up on the DPDK bug here:
https://bugs.dpdk.org/show_bug.cgi?id=85 about a proposed fix.
- Any hints on my segfault? I will definitely continue troubleshooting.
- Once I’ve sorted this out, how can I start using a single copy of the
virtio-pci codebase? I guess I have to make some changes to comply
with the API and check the dependencies.
- My current plan to contribute towards an IRQ-based implementation of
the virtio-vhost-user transport would be to use the vhost-user kick
file descriptors as a trigger to insert virtual interrupts and handle
them in userspace. The virtio-vhost-user device could exploit the
irqfd mechanism of the KVM for this purpose. I will keep you and the
list posted on this, I would appreciate any early feedback you may
Looking forward to any comments/feedback/pointers you may have. I am
rather inexperienced with this stuff, but it’s definitely exciting and
I’d love to contribute more to QEMU and SPDK.
Thank you for reading this far,
School of Electrical and Computer Engineering
National Technical University of Athens
A few improvements have recently been made to the patch review and CI workflows.
First – the CI environment can now be told to retrigger testing of a patch, when it appears that the previous test hit an intermittent failure unrelated to the patch in question. To retrigger testing of a patch, simply add a comment to the patch on GerritHub that starts with the word “retrigger”.
Second – the core maintainers are now using Gerrit hashtags in some cases to facilitate patch review workflow. An explanation of the hashtags is listed below.
All of this will be documented on http://spdk.io/development in the next couple of days.
= = =
SPDK core maintainers use a custom Gerrit dashboard to determine patches they should focus on next for review. In some cases, a core maintainer may request further action before voting on the patch. Further actions may include answering a question about a patch, requesting a rebase for the patch, or requesting that another developer first vote the patch +1.
In these cases, core maintainers will use Gerrit hashtags to mark patches where further action has been requested. This will remove the patch from the list of patches that core maintainers should focus on.
Once the request is met (i.e. the question is answered, the patch is rebased, or another developer voted the patch +1), the hashtag can be removed by the patch owner. Once the hashtag is removed, the patch will immediately go to the top of the list of patches for the core maintainers to review.
Hashtags for a patch are listed on the patch's main Gerrit review page, in the same area where the owner, reviewer, project and branch are listed. Hashtags can be removed by clicking on the small X next to the name of the hashtag.
The following hashtags are currently used by core maintainers:
* "waiting for +1" - requests a specific reviewer to provide the +1 or requests help from anyone to review the patch; the core maintainer will provide specifics in a comment on the patch
* "needs rebase" - this requests the submitter to rebase the patch from the latest master branch (or in some cases, newer versions of unmerged patches that the tagged patch depends on)
* "question" - the core maintainer has a question about the patch that needs to be answered before the core maintainer can vote on the patch
I have been looking at SPDK as a way to run a specific set of NVMe IO commands in a specific sequence. I have worked through some of the examples such as hello-world, identity, and perf but am still hitting some issues when trying to write my own code just trying to run a single IO command. I am typically seeing a segmentation fault most likely due to the data structures or parameters being passed to the read or write commands.
Are there any simplified examples available that just show single IO commands being created and sent to the device?
I have two nvmf subsystems:
nvme connect -t rdma -n nqn.2014-08.org.spdk:cnode1 -a 10.0.2.15 -s 4260
is ok when I connect to first nvmf subsystem.
When I try to connect to second one I get error:
Failed to write to /dev/nvme-fabrics: Cannot allocate memory
In dmesg I have:
[246436.995226] nvme nvme1: creating 10 I/O queues.
[246437.042307] nvme nvme1: failed to initialize MR pool sized 128 for QID 6
[246437.044734] nvme nvme1: rdma connection establishment failed (-12)
I can connect to second nvmf subsystem when I use Q parameter:
nvme connect -t rdma -n nqn.2014-08.org.spdk:cnode2 -a 10.0.2.15 -s 4260 -Q 32
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.
Is it correct that 2 processes cannot access SPDK simultaneously, for example
Start 2 terminals (root)
on terminal 1 run nvme_manage example, select #1 to list controllers but don't exit
on terminal 2 run identify example. Result EAL: FATAL: Cannot get hugepage information.
if we end the nvme_manage process then identity completes okay
The context is that I'm using Golang to interact with SPDK (using simple bindings to execute routines similar to the examples) and expected that I could run different tasks (sequentially) from go-routines. Example tasks include firmware update and executing fio workloads with the spdk fio_plugin example. What I'm finding is that even when executed within a go-routine (implemented with ULTs), the necessary resources are not released afterwards and then I can't issue a subsequent spdk task because I receive similar FATAL failures as listed above.
One solution might be to fork separate processes to perform the tasks require and build executables to do so but I don't want to have to resort to that.
Tom Nabarro BEng (hons) MIET
M: +44 (0)7786 260986
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.