Recently the following was committed :
bdev: add block device abstraction layer
Signed-off-by: Daniel Verkamp <daniel.verkamp(a)intel.com>
Can someone explain the target use case for this ? Is it provide block
like access using SPDK?
> Thank you Ben for the detailed reply.
A filesystem which can make use of SPDK is precisely the requirement.
Everything else is a way to get around that. In my specific use case I wish
to have a single nvme device which will have a rootfs as well. So such a
filesystem will need to handle that as well (probably I am being too
ambitious here). The only other "filesystem" that I am aware of is Ceph's
bluefs which is very minimal and specific to Rocksdb backend. On a side
note if I had more than one nvme device on a system , do all the nvme
devices need to be unbound from the kernel driver?
> On 7/14/16, 10:55 AM, "SPDK on behalf of Walker, Benjamin" <
> spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:
> >On Wed, 2016-07-13 at 12:59 -0700, txcy uio wrote:
> >> Hello Ben
> >> I have a use case where I want to attach one namespace of a nvme device
> to spdk driver and use the
> >> other namespace as a kernel block device to create a regular
> filesystem. Current implementation of
> >> spdk requires the device to be unbound completely from the native
> kernel driver. I was wondering
> >> if this is at all possible and if yes can this be accomplished with the
> current spdk
> >> implementation?
> >Your request is one we get every few days or so, and it is a perfectly
> reasonable thing to ask. I
> >haven't written down my standard response on the mailing list yet, so I'm
> going to take this
> >opportunity to lay out our position for all to see and discuss.
> >From a purely technical standpoint, it is impossible to both load the
> SPDK driver as it exists today
> >and the kernel driver against the same PCI device. The registers exposed
> by the PCI device contain
> >global state and so there can only be a single "owner". There is an
> established hardware mechanism
> >for creating multiple virtual PCI devices from a single physical devices
> that each can load their
> >own driver called SR-IOV. This is typically used by NICs today and I'm
> not aware of any NVMe SSDs
> >that support it currently. SR-IOV is the right solution for sharing the
> device like you outline in
> >the long term, though.
> >In the short term, it would be technically possible to create some kernel
> patches that add entries
> >to sysfs or provide ioctls that allow a user space process to claim an
> NVMe hardware queue for a
> >device that the kernel is managing. You could then run the SPDK driver's
> I/O path against that
> >queue. Unfortunately, there are two insurmountable issues with this
> strategy. First, NVMe hardware
> >queues can write to any namespace on the device. Therefore, you couldn't
> enforce that the queue can
> >only write to the namespace you are intending. You couldn't even enforce
> that the queue is only used
> >for reads - you basically just have to trust the application to only do
> reasonable things. Second,
> >the device is owned by the kernel and therefore is not in an IOMMU
> protection domain with this
> >strategy. The device can directly access the DMA engine, and with a small
> amount of work, you could
> >hijack that DMA engine to copy data to wherever you wanted on the system.
> For these two reasons,
> >patches of this nature would never be accepted into the mainline kernel.
> The SPDK team can't be in
> >the business of supporting patches that have been rejected by the kernel
> >Clearly, lots of people have requested to share a device between the
> kernel and SPDK, so I've been
> >trying to uncover all of the reasons they may want to do that. So far, in
> every case, it boils down
> >to not having a filesystem for use with SPDK. I'm hoping to steer the
> community to solve the problem
> >of not having a filesystem rather than trying to share the device. I'm
> not advocating for writing a
> >(mostly) POSIX compliant filesystem, but I do think there is a small core
> of functionality that most
> >databases or storage applications all require. These are things like
> allocating blocks into some
> >unit (I've been calling it a blob) that has a name and is persistent and
> rediscoverable across
> >reboots. Writing this layer requires some serious thought - SPDK is fast
> in no small part because it
> >is purely asynchronous, polled, and lockless - so this layer would need
> to preserve those
> >Sorry for the very long response, but I wanted to document my current
> thoughts on the mailing list
> >for all to see.
> >> --Tyc
> >> _______________________________________________
> >> SPDK mailing list
> >> SPDK(a)lists.01.org
> >> https://lists.01.org/mailman/listinfo/spdk
> >SPDK mailing list
I have a use case where I want to attach one namespace of a nvme device to
spdk driver and use the other namespace as a kernel block device to create
a regular filesystem. Current implementation of spdk requires the device to
be unbound completely from the native kernel driver. I was wondering if
this is at all possible and if yes can this be accomplished with the
current spdk implementation?
I am a development lead engineer at HCL Technologies Ltd. My team has
experience in NVMe driver development. Currently my team is planing to
develop nvmf initiator. I would like to contribute in nvmf initiator in
open SPDK. Please let me know the procedure and expectation for nvmf
We have written a test application that is utilizing the spdk library to benchmark a set of 3 Intel P3700 drives and a single 750 drive (concurrently). We’ve done some testing using fio and the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all drives on our system.
What we have found during our testing is that spdk will sometimes start to silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the following situations:
1. Testing a single drive and passing in 0 for max_completions to spdk_nvme_qpair_process_completions(). We haven’t seen any issues with single drive testing when max_completions was > 0.
2. Testing all four drives at once will result in one drive failing to receive callbacks, seemingly regardless of what number we pass for max_completions (1 through 128).
Here are other observations we’ve made
-When the callbacks fail to be called for a drive, they fail to be called for the remaining duration of the test.
-The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
-‘failure’ of a drive seems to be correlated with the number of outstanding read operations, though it is not a strict correlation.
Our system is a dual socket E5-2630 v3. One drive is on a PCI slot for CPU 0 and the other 3 are on PCI slots on CPU 1. The master/slave threads are on the the same cpu socket as the nvme device they are talking to.
We’d like to know what is causing this issue and what we can do to help investigate the problem. What other information can we provide? Is there some part of the spdk code that we can look at to help determine the cause?