On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
> Hi all,
> while looking at the block device subsystem code, I noticed bdev
> claims being used for two purposes at the same time. First (and a
> primary one, as far as I understood it) is a perfectly reasonable idea
> to provide the bdev subsystem with dependency information so that
> block devices could be brought up and shut down in the proper order
> (which doesn't seem to be in place yet, though).
> On the second hand, claims are somewhat counter-intuitively also used
> to ensure single writer semantics where only a module that has claimed
> the block device has write access. While it may be considered a safety
> measure of sorts, it doesn't go well with the case where a
> higher-level application is interested in shared access to the
> underlying block device(s), in particular when there are
> application-level means to ensure access consistency under multiple
> writers scenario.
> While adding support for the shared access semantics seems to be
> pretty straightforward, I thought it reasonable to bring the issue up
> here first, looking for insights, comments, and objections. Please let
> me know if there are any or I can give a shot to a shared access
Your analysis is all correct. There are two access control mechanisms in the
bdev layer for different purposes; claiming, which is for stacking bdevs
together to make I/O pipelines, and descriptors, which is standard read/write
access control for consumers of bdevs. We thought it made the most sense to
allow write access to a bdev from only a single module for safety reasons.
What seems to be a fairly reasonable safety measure for a general-purpose
I/O stack can easily become an unwarranted constraint in a tightly-controlled
(say, appliance) settings. Turning single writer into a stack-level
option could buy us the best of both worlds at once.
That's actually a fairly narrow limitation and I'm not sure
it is precluding you
from doing what you are looking to do.
The rule, as it is written today, is that if a bdev isn't claimed by a bdev
module (i.e. it doesn't have a bdev stacked on top of it), then multiple write
descriptors can be used. If it is claimed by a bdev module (i.e. another bdev is
stacked on top of it), then other consumers may only open it as read only.
Is your use case that you have some stack of bdevs, and then at some point your
application wants to write to one of the bdevs that's not at the top of the
stack? As a concrete example, this would be like having a RAID volume that you
normally write to, but at some point your application bypasses the RAID logic
and writes directly to one of the disks that back it. That sounds fishy to me,
but I'm open to legitimate use cases here.
The above logic is impeccable in a single system settings, but makes
far less sense
if one considers a distributed, e.g. cluster, environment where
of the same stack are being active simultaneously.
If write request in such a system is allowed to enter the local stack
below the top-level bdev, it would hit the single writer wall while
being perfectly legitimate
in the sense that it comes from the write-capable top-level bdev, just
another instance :).
If you application wants to use a bdev that is at the top of a stack
multiple places, then you should be able to open multiple write descriptors
today. Just don't have your application claim the bdev.
Please correct me if I'm wrong, but even this isn't the case with
virtual bdevs where
one needs to claim underlying bdevs for the virtual bdev even to get
We've tried to make this as clear as possible, but it could really use some
better documentation at a minimum. And discussion about how best to present
these concepts to users and make it easier to use is always very welcome. Let me
know what your thoughts are.
> Best regards,
> SPDK mailing list
SPDK mailing list