Back end device is malloc0 which is a memory device running in the “vhost” application
address space. It is not over NVMe-oF.
I guess that bio pages are already pinned because same buffers are sent to lower layers to
do DMA. Lets say we have written a lightweight ebay block driver in kernel. This would be
1. SPDK reserve the virtual space and pass it to ebay block driver to do mmap. This step
happens once during startup.
2. For every IO, ebay block driver map buffers to virtual memory and pass a IO
information to SPDK through shared queues.
3. SPDK read it from the shared queue and pass the same virtual address to do RDMA.
Couple of things that I am not really sure in this flow is :-
1. How memory registration is going to work with RDMA driver.
2. What changes are required in spdk memory management
On 8/13/19, 2:45 PM, "Harris, James R" <james.r.harris(a)intel.com>
The idea is technically feasible, but I think you would find the cost of pinning the
pages plus mapping them into the SPDK process would far exceed the cost of the kernel/user
From your original e-mail - could you clarify what the 50us is measuring? For
example, does this include the NVMe-oF round trip? And if so, what is the backing device
for the namespace on the target side?
On 8/13/19, 12:55 PM, "Mittal, Rishabh" <rimittal(a)ebay.com> wrote:
I don't have any profiling data. I am not really worried about system calls
because I think we could find a way to optimize it. I am really worried about bcopy. How
can we avoid bcopying from kernel to user space.
Other idea we have is to map the physical address of a buffer in bio to spdk
virtual memory. We have to modify nbd driver or write a new light weight driver for this.
Do you think is It something feasible to do in SPDK.
On 8/12/19, 11:42 AM, "Harris, James R" <james.r.harris(a)intel.com>
On 8/12/19, 11:20 AM, "SPDK on behalf of Harris, James R"
<spdk-bounces(a)lists.01.org on behalf of james.r.harris(a)intel.com> wrote:
On 8/12/19, 9:20 AM, "SPDK on behalf of Mittal, Rishabh via
SPDK" <spdk-bounces(a)lists.01.org on behalf of spdk(a)lists.01.org> wrote:
<<As I’m sure you’re aware, SPDK apps use spdk_alloc() with the
SPDK_MALLOC_DMA which is backed by huge pages that are effectively pinned already. SPDK
does virt to phy transition on memory allocated this <<way very efficiently using
spdk_vtophys(). It would be an interesting experiment though. Your app is not in a VM
I am thinking of passing the physical address of the buffers in bio to
spdk. I don’t know if it is already pinned by the kernel or do we need to explicitly do
it. And also, spdk has some requirements on the alignment of physical address. I don’t
know if address in bio conforms to those requirements.
SPDK won’t be running in VM.
SPDK relies on data buffers being mapped into the SPDK application's
address space, and are passed as virtual addresses throughout the SPDK stack. Once the
buffer reaches a module that requires a physical address (such as the NVMe driver for a
PCIe-attached device), SPDK translates the virtual address to a physical address. Note
that the NVMe fabrics transports (RDMA and TCP) both deal with virtual addresses, not
physical addresses. The RDMA transport is built on top of ibverbs, where we register
virtual address areas as memory regions for describing data transfers.
So for nbd, pinning the buffers and getting the physical address(es) to
SPDK wouldn't be enough. Those physical address regions would also need to get
dynamically mapped into the SPDK address space.
Do you have any profiling data that shows the relative cost of the data
copy v. the system calls themselves on your system? There may be some optimization
opportunities on the system calls to look at as well.
Could you also clarify what the 50us is measuring? For example, does this
include the NVMe-oF round trip? And if so, what is the backing device for the namespace
on the target side?
From: "Luse, Paul E" <paul.e.luse(a)intel.com>
Date: Sunday, August 11, 2019 at 12:53 PM
To: "Mittal, Rishabh" <rimittal(a)ebay.com>,
Cc: "Kadayam, Hari" <hkadayam(a)ebay.com>, "Chen,
Xiaoxi" <xiaoxchen(a)ebay.com>, "Szmyd, Brian"
Subject: RE: NBD with SPDK
Thanks for the question. I was talking to Jim and Ben about this a
bit, one of them may want to elaborate but we’re thinking the cost of mmap and also making
sure the memory is pinned is probably prohibitive. As I’m sure you’re aware, SPDK apps use
spdk_alloc() with the SPDK_MALLOC_DMA which is backed by huge pages that are effectively
pinned already. SPDK does virt to phy transition on memory allocated this way very
efficiently using spdk_vtophys(). It would be an interesting experiment though. Your app
is not in a VM right?
From: Mittal, Rishabh [mailto:email@example.com]
Sent: Saturday, August 10, 2019 6:09 PM
Cc: Luse, Paul E <paul.e.luse(a)intel.com>; Kadayam, Hari
<hkadayam(a)ebay.com>; Chen, Xiaoxi <xiaoxchen(a)ebay.com>; Szmyd, Brian
Subject: NBD with SPDK
We are trying to use NBD and SPDK on client side. Data path looks
File System ----> NBD client ------>SPDK------->NVMEoF
Currently we are seeing a high latency in the order of 50 us by using
this path. It seems like there is data buffer copy happening for write commands from
kernel to user space when spdk nbd read data from the nbd socket.
I think that there could be two ways to prevent data copy .
1. Memory mapped the kernel buffers to spdk virtual space. I am
not sure if it is possible to mmap a buffer. And what is the impact to call mmap for each
2. If NBD kernel give the physical address of a buffer and SPDK use
that to DMA it to NVMEoF. I think spdk must also be changing a virtual address to physical
address before sending it to nvmeof.
Option 2 makes more sense to me. Please let me know if option 2 is
feasible in spdk
SPDK mailing list
SPDK mailing list