Hi Fenggang,
Thank you for working on the important feature with community.
I’ve been impressed by your work.
I added a few preliminary feedback to the Gerrit.
I’m still reading your patch and haven’t read your following questions in the mail yet
too.
I will add any feedback if additionally found but Jim, Paul or others will provide better
feedback.
Thanks,
Shuhei
On May 13, 2018, at 3:28, Luse, Paul E <paul.e.luse(a)intel.com>
wrote:
Hi Fenggnag,
That’s great! I’ll do my best on your questions and then Jim or someone can help fill in
the blanks J Thanks for getting this posted, will try and look at it soon. As most are
aware the summit is next week and at least the Intel folks, most of us anyway, will be
flying out very early Mon so review times will be limited until after the summit.
Thanks!!!
Paul
<>
<>From: SPDK [mailto:
[email protected]
<mailto:
[email protected]>] On Behalf Of Fenggang Wu
Sent: Friday, May 11, 2018 9:42 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org
<mailto:
[email protected]>>
Subject: Re: [SPDK] vbdev_agg module (Was: Understanding io_channel)
Hi Jim and Paul,
I've ported the code over to the current version of SPDK and uploaded an RFC patch on
gerrithub <
https://review.gerrithub.io/#/c/spdk/spdk/+/410962/> for comments.
Besides, I got some questions when porting my code.
1) I got confused about io_device and bdev. What is the difference between them? I
implemented vbdev_agg by guessing the usage of the two by following the existing code, but
still lack a clear understanding. In the implementation, the conversion between the two is
here <
https://github.com/spdk/spdk/blob/master/lib/bdev/bdev.c#L220>. What does this
pointer shift mean?
So a bdev is the actual concrete implementation of a block device that you target IO to,
along with a channel. An IO device is a higher abstraction used by the block device layer
code to help keep track of channel stuff. If you look at the definition of
spdk_io_device_register() in io_channel.c you can see what I mean. It associates the
io_channel’s create/destroy callbacks for the io_device as well as keeps track of the
context size which is the size of your extension of the io_channel so it can be allocated
later when someone wants to get a channel for this thing. Then it puts that io_device
(address) on a global list so that anytime someone wants to get a channel (see
spdk_get_io_channel()) that global list is searched for the io_device and then either an
existing channel for that thread is returned and a ref count bumped or a new channel is
allocated and returned after the callback that was passed in when the io_device was
registered is called (your code). You’ll see the following macros that let the block
device layer make the simple conversion between bdev and io_device. One of the other guys
will have to explain the history behind this but if you follow those few functions I
mention hopefully it will start to make more sense. I guess that’s the link you mention
above also, I don’t see a shit but if you mean the +/- 1 that’s the conversion I mention
that’s going to require someone else to chime in as to why but hopefully you can see what
its doing at least J Since an io_device doesn’t have to be a bdev I suspect that’s why
there are these little helpers for when it is but I’m just guessing at this point.
#define __bdev_to_io_dev(bdev) (((char *)bdev) + 1)
#define __bdev_from_io_dev(io_dev) ((struct spdk_bdev *)(((char *)io_dev) - 1))
2) I tested my code using example/blobfs/mkfs. The log shows that the file system is
successfully initialized. However, the program hangs afterward. This problem does not
occur before my porting the code to the recent version of SPDK. The log shows one of the
based devices got hotremov()ed as my hotremove_cb() is called. I did not hotremove it
physically. What would be the possible reason? (conf file and running log is attached in
the end if you are interested).
Hmmm, I hadn’t thought of testing it that way. I’m sure one of the others can help with
this question, I’m not sure what could be going on.
3) About the implementation of the vbdev_agg_base_bdev_hotremove_cb(), I looked through
some other vbdev implementation of hotremove_cb() but couldn't extract a general idea
how to handle a hot-remove. In my vbdev_agg scenario, basically, the aggregated device is
not usable when any of the base devices are removed. Therefore, what should be a proper
action? Unregister the vbdev_agg, or?
Yeah, when you get that callback it’s totally up to you what to do based on the
characteristics of your vbdev. If you are a RAID0 type device (I haven’t looked at your
code yet) then you’ll want to do something for sure and unregistering it makes sense
initially at least. There’s another patch up there doing something similar,
seehttps://review.gerrithub.io/#/c/spdk/spdk/+/410484/
<
https://review.gerrithub.io/#/c/spdk/spdk/+/410484/> that you might want to take a
look at keeping in mind it was just posted also and has lots of comments so subject to
many changes as well.
Any ideas/comments are appreciated. Thanks!
Best!
-Fenggnag
PS: mkfs test
$> cat examples/agg/agg_test.conf
# [Nvme] section will get appended here by scripts/gen_nvme.sh.
[Ioat]
Disable Yes
[Gpt]
# If Gpt is disabled, it will not automatically expose GPT partitions as bdevs.
Disable Yes
[Nvme]
TransportID "trtype:PCIe traddr:0000:04:00.0" Nvme0
TransportID "trtype:PCIe traddr:0000:05:00.0" Nvme1
[Agg]
VBDev TestAgg Nvme0 Nvme1
$> sudo ./test/blobfs/mkfs/mkfs examples/agg/agg_test.conf TestAgg
Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...
[ DPDK EAL parameters: spdk_mkfs -c 0x3 -m 1024 --file-prefix=spdk_pid20742 ]
EAL: Detected 24 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Multi-process socket /var/run/.spdk_pid20742_unix
EAL: Probing VFIO support...
app.c: 521:spdk_app_start: *NOTICE*: Total cores available: 2
reactor.c: 669:spdk_reactors_init: *NOTICE*: Occupied cpu socket mask is 0x3
reactor.c: 453:_spdk_reactor_run: *NOTICE*: Reactor started on core 1 on socket 1
reactor.c: 453:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
vbdev_agg.c: 880:vbdev_agg_examine: *NOTICE*: total_size 745GB (2 base devices)
Initializing filesystem on bdev TestAgg...done.
vbdev_agg.c: 573:vbdev_agg_base_bdev_hotremove_cb: *NOTICE*: base dev Nvme1n1 got
removed!!
(hangs...)
On Mon, May 7, 2018 at 1:36 AM Fenggang Wu <wuxx0835(a)umn.edu
<mailto:
[email protected]>> wrote:
Thanks, Jim and Paul. Glad to hear that!
I found my code is based on the SPDK version of last October, and bdev has evolved since
then (e.g. bdev registration, function return value, etc.). Let me update my code
accordingly and submit an RFC version soon.
Thanks again!
-Fenggang
On Fri, May 4, 2018 at 8:43 PM Luse, Paul E <paul.e.luse(a)intel.com
<mailto:
[email protected]>> wrote:
Totally agree w/Jim, looking forward to seeing the patch!
-Paul
<>From: SPDK [mailto:
[email protected]
<mailto:
[email protected]>] On Behalf Of Harris, James R
Sent: Friday, May 4, 2018 5:12 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org
<mailto:
[email protected]>>
Subject: Re: [SPDK] vbdev_agg module (Was: Understanding io_channel)
Hi Fenggang,
Sorry for the long delay in responding.
I think the vbdev_agg (striping) feature would fit well in the SPDK landscape. It
complements lvol – allowing someone to stripe a block device across multiple underlying
block devices, and then put a Blobstore/lvolstore on top of that.
Maybe you can post your vbdev_agg changes as an RFC patch to GerritHub? You can add a
lengthy commit message or a README to add any explanations, questions, etc. If you add
“[RFC]” in the title, it will not run through the test pool (if that is a concern right
now). Then the community can review and provide feedback through the normal code review
process.
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org <mailto:
[email protected]>> on
behalf of Fenggang Wu <wuxx0835(a)umn.edu <mailto:
[email protected]>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org
<mailto:
[email protected]>>
Date: Monday, April 23, 2018 at 8:11 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org
<mailto:
[email protected]>>
Subject: [SPDK] vbdev_agg module (Was: Understanding io_channel)
Hi Jim,
Our industry partner is pushing forward on this vbdev_agg module along with other
efforts, as Tushar and I recently discussed in a separate context.
I think it is a good time to finalize the possible contribution to SPDK too. I notice
SPDK have the logical volume (lvol) developed. Just wondering how this vbdev_agg may fit
into current SPDK landscape? What would be the suggested path to the main branch?
Thanks!
-Fenggang
On Tue, Oct 10, 2017 at 4:04 PM Harris, James R <james.r.harris(a)intel.com
<mailto:
[email protected]>> wrote:
> On Oct 10, 2017, at 1:53 PM, Fenggang Wu <fenggang(a)cs.umn.edu
<mailto:
[email protected]>> wrote:
>
> Hi Jim,
>
> Thank you very much for the great answer! It makes perfect sense to me. This saves
so much time.
>
>
> On Mon, Oct 9, 2017 at 7:04 PM Harris, James R <james.r.harris(a)intel.com
<mailto:
[email protected]>> wrote:
> Hi Fenggang,
>
> > On Oct 9, 2017, at 12:04 PM, Fenggang Wu <fenggang(a)cs.umn.edu
<mailto:
[email protected]>> wrote:
> >
> > Hi,
> >
> > I am new to SPDK and trying to develop an aggregated virtual block device
module (vbdev_agg.c) that stripes across multiple base devices. I am having difficulty
understanding
>
> Welcome to SPDK! An aggregated virtual block device module is interesting - will
this do striping and/or concatenation?
>
> Current I am only considering striping. But I would expect an easy extension from
striping to concatenation.
I think striping is the more interesting and common use case.
> <snip>
>
> Now I have learned from the nvme module and register my agg_disk struct as the void*
io_device, or better named, the "unique pointer". Currenly, a space with a size
of a array of io_channel pointers is allocated after the io_channel struct. The
io_channels pointers of the base devices are kept in the array. They are got
(get_io_channel(base_dev)) in the create_cb and put (put_io_channel(base_dev)) in the
destroy_cb.
Yes - that sounds right.
>
>
> >
> > Any suggestions/hints will be appreciated. Thank you very much!
>
> If you would like to post your module to GerritHub, I’m sure you’d get some good
review feedback from myself and others. Please note that this is a very active area of
development right now. Your questions are really appreciated and will help us clarify
where we need to improve on example code and documentation.
>
>
> Personally I would like to share it or even make some contribution to the community
if possible. Yet I would have to double check with the industry partner supporting my
project to see their opinions.
That would be fantastic. A striping module would be generally useful to SPDK. Plus if
the module is contributed to SPDK, the SPDK project will make sure it gets tested
automatically as part of the per-patch test suite to ensure against regressions. This
might be important to your industry partner.
Thanks,
-Jim
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org <mailto:
[email protected]>
https://lists.01.org/mailman/listinfo/spdk
<
https://lists.01.org/mailman/listinfo/spdk>
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org <mailto:
[email protected]>
https://lists.01.org/mailman/listinfo/spdk
<
https://lists.01.org/mailman/listinfo/spdk>___________________________...
SPDK mailing list
SPDK(a)lists.01.org <mailto:
[email protected]>
https://lists.01.org/mailman/listinfo/spdk
<
https://lists.01.org/mailman/listinfo/spdk>