I have 2 part question relating to performance benchmarking of the SPDK and fio_plugin.
1. Is there any reference document specifying the workload types , queue depth and block size used for benchmarking the SPDK performance numbers.
Is there any OS level performance tuning to be done. It will be great if we can get some insight on the performance testbed used.
I did find the DPDK performance optimisation guidelines https://lists.01.org/pipermail/spdk/2016-June/000035.html <https://lists.01.org/pipermail/spdk/2016-June/000035.html>. Which is useful.
2. I am trying fio_plugin for benchmarking the performance, the jobs are completing with following error
“nvme_pcie.c: 996:nvme_pcie_qpair_complete_pending_admin_request: ***ERROR*** the active process (pid 6700) is not found for this controller.”
I found that the nvme_pcie_qpair_complete_pending_admin_request()is checking if the process exist, this is where the error message is coming from.
I am not sure how this process is getting killed even before completion. There is no there operation done on this system apart from running fio plugin.
Was any similar issue seen in the past, which might be of help to get around this error.
Below is my setup details
OS : fedora 25 ( server edition )
Kernel version : 4.8.6-300
DPDK version : 6.11
I have attached a single nvme drive of 745GB from Intel on which i am running the FIO
Below workaround were tried and the issue still persists not sure how to get around this.
1. Tried different workloads in FIO
2. I did detach the NVMe and attach a new NVMe drive
3. Re-installed the DPDK , SPDK and FIO tool
Following links were used to install and setup SPDK and FIO plugin
https://github.com/spdk/spdk <https://github.com/spdk/spdk> —> SPDK
https://github.com/spdk/spdk/tree/master/examples/nvme/fio_plugin <https://github.com/spdk/spdk/tree/master/examples/nvme/fio_plugin> —> FIO_plugin
I am trying to find memory errors with memory blocks allocated by
rte_malloc in my SPDK application. Errors are such as buffer overrun,
memory leak, etc.
I found a Valgrind modified for DPDK. Thankfully, it works for SPDK, too.
But this tool seems not working for memory blocks allocated by rte_malloc.
As an alternative, does SPDK or DPDK have mechanism to support memory
error troubleshooting, such as putting magic words at memory
boundaries and giving current number of rte_malloc'ed bytes?
I didn't find, if there is any, support available for NVMf multipathing. I
have a SPDK target, connected with standard Kernel based initiator.
Could you please tell me, is there any plan to push it into the Kernel?
Or, is there any project being carried out, outside the main stream, which
is available for experimentations?
As per understanding, using SPDK we can achieve full performance of a SSD
with 1 core and there is 1 queue pair per thread and 1 thread per core.
Typically applications spans multiple threads. As locking is discouraged
in SPDK, we have to map each thread to a queue pair. By doing so, we might
end up keeping all the cores busy just for IO polling. How do you guys
compare this scenario with Kernel mode in-terms of overall performance and
utilization? How do we control cpu utilization with no locks while
satisfying application that spans multiple threads?
I am attempting to use the SPDK blob store to implement a basic NVMe-based flat file store. I understand that this is a new addition to the SPDK that is under active development and that documentation/examples of usage are sparse. But this is a great new addition to the SPDK that I've been tracking and so I'm eager to begin using it.
With that being said, I've been scouring through its usage in the bdev component, as well as the test cases in an attempt to glean how I might integrate it into my code base (specifically, I am already successfully using the SPDK to interact with NVMe devices) but have a few high-level questions that I hope are easy to answer.
1) In the most basic usage, it seems IO channels should be 1-to-1 with threads. It looks like I must start a thread, call spdk_allocate_thread(), then spdk_get_io_channel() to get the spdk_io_channel instance created and associated with that thread.
Since spdk_bs_dev.create_channel is synchronous, it looks like I must block the create_channel() call while the above is happening in the new IO thread. Is this a reasonable approach, or am I misinterpreting how IO channels are intended to work?
2) I've already got a set of IO threads for executing asynchronous NVMe operations (e.g. spdk_nvme_ns_cmd_read(...)) against one or more devices. These IO threads each own a set of NVMe queue pairs, and have queuing mechanisms allowing for the submission of work to be performed against a specific device. Given this, I am interpreting an IO channel to essentially be an additional "outer" queue of pending blob-IO operations that are processed by an additional, dedicated thread. A call to spdk_bs_dev.read() or .write() would find the correct IO channel thread, enqueue an "outer" blob op, and the channel IO thread would then enqueue one or more lower-level NVMe IO operations on the "inner" queue. Does this interpretation match the intended usage? Am I missing something?
3) spdk_bs_dev.unmap() appears to correspond to dealloc/TRIM. Is this correct?
4) I've read through the docs at http://www.spdk.io/doc/blob.html and understand at a high level how things are being stored on disk, but there are references to the caching of metadata. My current workload will likely generate on the order of 100K to 1M blobs of sizes ranging from 512KB to 32MB, each with a couple of small attributes. Is there any way to estimate the total size (in memory) of the cache? Also, are any metadata modifications O(n) in the number of blobs?
Thanks in advance for any help or insight anyone can provide. Any assistance is greatly appreciated.
- George Kondiles
Intel Builders Developer Summit Featuring SPDK & ISA-L
Meet with key contributors to SPDK, learn details about the latest and upcoming SPDK ingredients, and participate in roundtable discussions that will shape the SPDK community. The two day event also includes introductions to adjacent software libraries and opportunities for informal networking and private discussions.
April 19, 2017 - April 20, 2017
Hyatt Regency Santa Clara
5101 Great America Parkway
Santa Clara, CA 95054
April 19 - 9 am - 6 pm
April 20 - 9 am - 3 pm
Registration Deadline: March 28, 2017
Agenda Topics (Subject to Change)
Blobstore: A local, persistent, power-fail safe block allocator designed to replace filesystem usage in many popular databases.
VM I/O Efficiency: SPDK takes QEMU VM I/O efficiency one big step further with the SPDK vhost-scsi Target.
NVMe over Fabric: A deep-dive into the new flash-native block protocol, covering the use cases and capabilities of the SPDK NVMe-oF ingredients.
Under the Hood: Unpacking the design choices that lead to high-performance storage ingredients.
Performance Test & Tune: The SPDK performance team shares best practices and tips.
Community Roundtable: Loosely moderated discussion focused on process and governance.
We are using SPDK on target end, but kernel Initiator. We used fio, but
found some unexpected results. (latency figures almost similar for SPDK
target & Kernel Target, besides we are using NULL block devices too, so
media latency is not involved here)
So, I would like to check for certain fio parameters, if in case we are
couple of samples from fio "read I/O" run (SPDK NVMe-oF target, Kernel host)
(fio command line:
fio --bs=4k --numjobs=16 --iodepth=4 --loops=1 --ioengine=libaio --direct=1
--invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based
--runtime=10 --filename=/dev/nvme0n1 --name=read-phase --rw=read )
slat (usec): min=1 , max=43 , avg= 3.61, stdev= 1.77
clat (usec): min=40 , max=414 , avg=233.02, stdev=28.75
lat (usec): min=47 , max=416 , avg=236.73, stdev=28.79
slat (usec): min=1 , max=41 , avg= 3.63, stdev= 1.77
clat (usec): min=71 , max=473 , avg=232.60, stdev=28.65
lat (usec): min=74 , max=476 , avg=236.33, stdev=28.70
slat (usec): min=1 , max=48 , avg= 3.64, stdev= 1.82
clat (usec): min=17 , max=477 , avg=233.10, stdev=28.87
lat (usec): min=21 , max=479 , avg=236.84, stdev=28.93
All above samples (infact every sample) shows maximum clat *above 380us*.
Also, the average clat is *never below 230us*.
We also compared it using kernel target (& Kernel host) and apparently we
didn't see much difference between two runs.
Max clat range was almost in the range (only 10-20usec higher than SDPK
target), but average clat was, almost, exactly the same (230-240us).
Is it really expected? (I believe, its not!)
So, are we missing any fio flag(s)?
On Mon, 2017-03-06 at 23:04 +0530, Rajat Maheshwari wrote:
> Which all supported IO utilities are available for NVMe-over-Fabric
> latency analysis?
> Also, does all these applications support parallel exploitation of
> NVMe Queues?
The majority of people use fio. Are you planning to use the SPDK NVMe-
oF target, the NVMe-oF host, or both? If you are using just the SPDK
NVMe-oF target with the kernel host, fio is an excellent tool.
> SPDK mailing list
SPDK mailing list
I was looking at the function spdk_nvmf_session_gen_cntlid(). Looks like
the cntlid is unique across all nvme subsystems.
I thought the cntlid is unique in a subsystem as per the spec. Am I missing
[1st email sent to wrong mailing list address]
From: Paul Von-Stamwitz
Sent: Tuesday, March 21, 2017 8:24 PM
Cc: Verkamp, Daniel (daniel.verkamp(a)intel.com); Walker, Benjamin (benjamin.walker(a)intel.com)
Subject: bdev/NVMe pass-though command
Hi Ben and Daniel,
We had an off-line discussion on implementing a NMVe pass-through command at the bdev level, and I thought to include the community in the discussion. Our primary use case is for the retrieval of SMART/Health information via the Get Log page, but it could be used for other purposes.
How do you envision this?
Should the upper layer send down a raw NVME command which gets passed down to blockdev_nvme and is handled similarly to nvmef/direct.c?
Since multiple bdev contexts can share the same admin queue pair, should we limit which context is allowed to use the pass-through?
Technically we could have an I/O pass-through, but I think we should limit it to admin commands.
Should we put checks on what is allowed (i.e. read-only commands) or let anything go through?
I would appreciate your thoughts, since we would like to get started on this.
I am trying to Test multipathing related scenarious in NVMeF, had few
1) Is there any code changes(drivers) related to NVMeF multipathing has
been pushed into the linux kernel (4.9 or above) (I am using 4.9.3).
2) If not, Is there any independent project available which has NVMeF
multipathing related changes ?
Thanks in advance!