glad to see this moving forward.
On Wed, Jun 20, 2018 at 8:24 PM Walker, Benjamin <benjamin.walker(a)intel.com>
I'm trying to get back to some messages that slipped through the cracks. I
what you're doing here is important and needs to be addressed. Responses
You've actually caught me up in the process of writing a status update
email which, to make the long story short, is very much inline with what
you write below.
With some cosmetic changes (mostly related to vbdev->base_bdevs and
bdev->vbdevs run-time management), I was able to get a multi-tenant vbdev
properly handle its base bdev's hot-plug/remove events. The major
ingredient of the spdk bdev layer infrastructure I was missing is the
channel iterator that proved to be the key to safely disabling/enabling
channels for the base bdev being removed/added. Other than that, the only
truly missing ting was just the base bdev addition under vbdev->base_bdevs
at run-time and some minor extras (which I can turn into a specific patch
and submit for review), so bdev layer has basically proven me wrong. The
item standing out is bdev subsystem initialization/shutdown, more on this
On Thu, 2018-05-31 at 14:16 +0300, Andrey Kuzmin wrote:
> Planning for a multi-tenant virtual bdev driver, I looked into the
> base bdev management capabilities and found them short of what I need.
> issues I see are outlined below. Let me know if the analysis is correct
> if yes, are there any plans to provide for the dynamic base bdev
> capabilities in the multi-tenant vbdev use case.
> 1. Vbdev startup
> spdk_vbdev_register at present allows one to register a completely
> vbdev (with all base bdevs already examined) only. The root cause behind
> fully-assembled requirement above is spdk_vbdev_set_base_bdevs call that
> follows, which assumes that vbdev's base bdevs haven't been set up yet.
> Apparently, a non-trivial multi-tenant vbdev should be allowed to start
> a partially assembled state; erasure code-based RAID provides a
> example of a vbdev that is expected to be/remain operational while an
> arbitrary number of base bdevs is missing permanently or temporarily, in
> particular (but not limited to) at startup time.
> Furthermore, a vbdev like this should be able to register a hot-plugged
> bdev at any point of runtime, yet again pointing to the need for a
> vbdev_register_base_bdev(vbdev, base_bdev) call in addition to/in
> of the available spdk_vbdev_set_base_bdevs method (more on this under
> plug below).
I agree that bdev modules should be able to expose bdevs that are only
assembled and at run time add or remove base bdevs as necessary. I also
that the vbdev_* API has a lot of assumptions about when the base bdevs are
known, and that is not going to work for you. However, the vbdev_* APIs are
convenience wrappers only. You can perform every required operation in a
module without using those, and I think that's what you're going to want
here. I think we need to audit the bdev module API and clarify which
are the "fundamental" ones and which are these convenience vbdev things
work for 90% of vbdevs but not all of them. Since you sent this note, I've
created a public header file that is intended to define the bdev module API
officially (include/spdk/bdev_module.h). Now we just need to iterate on
make it clearer.
Yes, I noticed this one.
> 2. Bdev surprise removal
> SPDK bdev ops vector includes .hotremove method which, for each open
> descriptor, gives vbdev module an opportunity to clean up and/or do any
> redundancy-related base bdev management.
> While .hotremove provides for the vbdev-internal bdev management on hot
> remove, spdk_bdev_unregister which completes hot-remove handling in the
> layer does not remove base bdev from vbdev's base bdev list, so base
> question still sits on the list after being removed. The reason is
> missing vbdev->base_bdevs dynamic management in general and
> vbdev_remove_base_bdev(vbdev, bdev) call in particular, required to
> vbdev->base_bdevs list on a single bdev removal.
Agreed - the vbdev wrappers need to either be improved, or you need to use
lower level APIs.
I have a patch for this specifically.
> 3 Bdev hot plug
> At present virtual bdev design does seem to provide any support for base
> hot-plug. Vbdev's extant .examine method seems to be geared toward
> vbdev setup in that it assumes no open vbdev descriptors (so that vbdev
> base bdevs descriptor linkage occurs when vbdev is subsequently opened
> I/O channels are created).
> There is currently no .hotplug mechanism complementary to .hotremove that
> would propagate base bdev insertion throughout all open vbdev
> that vbdev has a chance to set up the I/O channel/do other house-keeping
> the plugged base bdev on each vbdev descriptor/channel open at the
> the base bdev insertion.
You need a mechanism such that on hot-insert a message is sent to each
I/O channel for the bdev to perform per-channel initialization? Can you
Right on the spot :).
> 4. Vbdev shutdown
> It appears that, while bdev subsystem start-up proceeds in the expected
> bottom-up fashion, with vbdevs instantiated as the underlying base bdevs
> up, the reverse is not true: on bdev subsystem shutdown, I see vbdev's
> .hotremove being called where I would expect vbdev being
> Understandably, for a vbdev module author it would be very helpful to be
> to differentiate between planned (sub)system shutdown and hot removal of
> base bdev at run time; for this to happen, bdev subsystem shutdown should
> proceed top-down, with virtual bdevs unregistered prior to the underlying
I agree - Pawel Wodkowski is working in this area.
Just as a suggestion, below is what I arrived at re this specific subject.
*Bdev subsystem init/shutdown*
- While bdev layer currently provides init_complete hook, for bdevs
interested in taking any special action once bdev subsystem initialization
is done (such as avoiding device assembly until all present base bdevs have
been examined), that does not seem to be of much help as there is no way
for the bdev module to check if its examine method is being called during
subsystem initialization or at run-time (which are two completely different
scenarios from vbdev's author standpoint).
If we had something like spdk_bdev_subsustem_init_in_progress() that would
allow bdev module to check that in its examine method, it could then take
actions as appropriate for either init or run-time hot-plug scenarios.
- Bdev subsystem shutdown, IMO, could use an update similar to
spdk_bdev_next_leaf() (though based on bdev->vbdevs) that would make it
proceed in the top-down fashion, with bdev graph walked via bdev->vbdevs
paths and shutdown then initiated by bdev_unregister iterator walking the
graph from top-level vbdevs down to base bdevs. That would let virtual bdev
module rely on the assumption that regular shutdown does not involve
base_bdev_hotremove being called, with the latter then reserved for
hot-remove run-time events.
I also wanted to mention that a few patches have gone in which allow
bdev module to perform its examine without write access. At least one more
is required still, but once complete we can merge your patch that separates
claiming bdevs from opening them.
That's definitely a good news.
Once that is done, we need to begin work to differentiate the various
bdev could be examined and the various reasons a bdev could be removed.
SPDK mailing list