I’ve been digging into the NVMe-oF target tests today – starting with looking at how long it takes to run all of the per-patch tests. I’d like to consider a new strategy for using rpc.py from our test scripts to reduce the number of times we start the Python interpreter. There’s a non-trivial amount of overhead just related to starting the interpreter – and tests like nvmf/shutdown do 30 RPCs (10 subsystems * 3 RPCs to set up each subsystem). If it’s only 100ms overhead per RPC, that’s 3 seconds wasted*.
There are two approaches I can see:
1. Save the configuration in JSON RPC format, and check in the file into the test directory. Then we just have a single RPC to call (load_config).
2. Add a new top level function to rpc.py called “run_from_file”. This is not an RPC – it just loads a file, and runs each line as if it is was its own rpc.py invocation.
* A variation could also be piping this file through stdin
I think #2 is probably better. At least for nvmf/shutdown, we use a different number of subsystems based on whether we’re testing with a real RDMA NIC or with soft-roce. In those cases, we’d need two separate JSON RPC format files.
This would also be helpful for adding more stressful tests – for example, create 1000 subsystems or 1000 logical volumes. The SPDK application itself can do this pretty quickly, but I think our current strategy for actually building that config via RPC would force it be run only nightly due to length of time.
I’ll probably try putting something together tomorrow, but wanted to see if anyone else had any additional thoughts or comments.
* I know 3 seconds doesn’t sound like much. It’s probably more like 5 or 6 seconds in this case. We probably run about 8 patches per hour through the test pool – so every 7-8 seconds saved is a 1% improvement in test pool throughput. There are several other places we could use this – especially vhost and iscsi.
Hi SPDK community,
We at eBay are actively engaged with Intel SPDK team for different solutions, one of which is to utilize NVMeOF via TCP for high performance cloud storage solution.
What makes our solution a bit different, disallowing use of the standard NVMe driver is that, we would like to drive some high level storage features (e.g. drive pooling, replication, snapshot) from the storage clients rather than a more traditional target side so that we can scale horizontally across storage nodes. While it is possible to make customizations within the MD and Block layers within the Linux kernel we are looking to keep most of these higher level functions in user space. SPDK could allow us to do so, but in order to make it transparent to applications requiring a traditional POSIX filesystem we need IO operations to pass through the VFS layer. One solution is to use the in-kernel NBD driver against the loopback device in order to re-route IO operation back into the user space.
[ applications] [SPDK] —————— NVMe(TCP)———> to remote target
| [/mnt/db] | user
| | kernel
[ VFS ] |
[NBD]————— - [lo]
There are pros and cons associated with this approach but the main advantage is transparency to application (and filesystems) and keeping any higher level functions out of the kernel. The main disadvantage is that this path is suboptimal for IO, perhaps due to extra kernel-user context switches between NBD/lo and SPDK.
Before we can embark on this route, we would like to know if anyone else here tried/considered this approach? If so, any take away from that? Can we do some performance tuning to make this performant for IO?
Also, if there is any one else interested in this approach and, if we can perhaps collaborate and contribute back to the upstream? Intel SPDK team mentioned that this path is not under active testing by them, but if there is a need, they can get involved as well.
Please contact us (reply-all) if you/your team is interested in this project.
I am looking for using spdk_threads with a custom scheduler as presented by Ben in the recent summit. I understand that I can do that by calling spdk_thread_poll from within my scheduler's poller. But I see that there are thread local variables used by various modules (such as bdev) which wont work if I move my poller to a different core. One example is spdk_bdev_open which gets spdk_thread from the tls variable. Thoughts/ideas ?
Could anyone help with how to configure git for spdk/qemu review?
SPDK Development guide works only for spdk/spdk review.
And my commit to spdk/rocksdb review has been pending so long.
Anyone of spdk core maintainers might resolve where this to go.