Hi all,
I was hoping to start a bit of a design discussion about the future of the
NVMe-oF target library (lib/nvmf). The NVMe-oF target was originally created as
part of a skunkworks project and was very much an application. It wasn't
divided into a library and an app as it is today. Right before we released it,
I decided to attempt to break it up into a library and an application, but I
never really finished that task. I'd like to resume that work now, but let the
entire community weigh in on what the library looks like.
First, libraries in SPDK (most things that live in lib/) shouldn't enforce a
threading model. They should, as much as possible, be entirely passive C
libraries with as few dependencies as we can manage. Applications in SPDK
(things that live in app/), on the other hand, necessarily must choose a
particular threading model. We universally use our application/event framework
(lib/event) for apps, which spawns one thread per core, etc. We'll continue
this model for NVMe-oF where app/nvmf_tgt will be a full application with a
threading model dictated by the application/event framework, while lib/nvmf
will be a passive C library that will depend only on other passive C libraries.
I don't think this distinction is at all reality today, but let's work to make
it so.
The other major issue with the NVMe-oF target implementation is that it has
quite a few baked in assumptions about what the backing storage device looks
like. In particular, it was written assuming that it was talking directly to an
NVMe device (Direct mode), and the ability to route I/O to the bdev layer
(Virtual mode) was added much later and isn't entirely fleshed out yet. One of
these assumptions is that real NVMe devices don't benefit from multiple queues
- you can get the full performance from an NVMe device using just one queue
pair. That isn't necessarily true for bdevs, which may be arbitrarily
complex virtualized devices. Given that assumption, the NVMe-oF target
today only creates a single queue pair to the backing storage device and only
uses a single thread to route I/O to it. We're definitely going to need to
break that assumption.
The first discussion that I want to have is around what the high level concepts
should be. We clearly need to expose things like "subsystem", "queue
pair/connection", "namespace", and "port". We should probably
have an object
that represents the entire target too, maybe "nvmf_tgt". However, in order to
separate the threading model from the library I think we'll need at least two
more concepts.
First, some thread has to be in charge of polling for new connections. We
typically refer to this as the "acceptor" thread today. Maybe the best way to
handle this is to add an "accept" function that takes the nvmf_tgt object as an
argument. This function can only be called one a single thread at a time and is
repeatedly called to discover new connections. I think the user will end up
passing a callback in to this function that will be called for each new
connection discovered.
Second, once a new connection is discovered, we need to hand it off to some
collection that a dedicated thread can poll. This collection of connections
would be tied specifically to that dedicated thread, but it wouldn't
necessarily be tied to a subsystem or a particular storage device. I don't
really know what to call this thing - right now I'm kind of thing
"io_handler".
So the general flow for an application would be to construct a target, add
subsystems, namespaces, and ports as needed, and then poll the target for
incoming connections. For each new connection, the application would assign it
to an io_handler (using whatever algorithm it wanted) and then poll the
io_handlers to actually handle I/O on the connections. Does this seem like a
reasonable design at a very high level? Feedback is very much welcome and
encouraged.
If I don't hear back with a bunch of "you're wrong!" or "that's
stupid!" type
replies over the next few days, the next step will be to write up a new header
file for the library that we can discuss in more detail.
Thanks,
Ben
Attachments:
- smime.p7s
(application/x-pkcs7-signature — 3.2 KB)
Show replies by thread