Here is the comment in nvme_pcie_qpair_construct():
/*
* Reserve space for all of the trackers in a single allocation.
* struct nvme_tracker must be padded so that its size is already a power
of 2.
* This ensures the PRP list embedded in the nvme_tracker object will not
span a
* 4KB boundary, while allowing access to trackers in tr[] via normal
array indexing.
*/
It's a beautiful design in SPDK. :)
On Wed, Aug 9, 2017 at 1:18 PM, Liu, Changpeng <changpeng.liu(a)intel.com>
wrote:
Yes, you are right.
SPDK embedded PRP list into the struct nvme_tracker, and the data
structure is 4KiB aligned,
And also several other fields, so only 506 entries left for PRP lists.
> -----Original Message-----
> From: SPDK [mailto:
[email protected]] On Behalf Of Lance
Hartmann
> ORACLE
> Sent: Wednesday, August 9, 2017 1:01 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Determination of NVMe max_io_xfer_size
> (NVME_MAX_XFER_SIZE) ?
>
>
> Ok, but 506 * PAGE_SIZE? Surely 506 wasn’t arbitrarily selected? I
understand
> that the controller’s Identify Controller structure may indicate far
fewer pages
> supported, but if, as the comment suggests, PRP2 is pointing to a list,
then why
> reduce the number “just a few”? I feel like I’m missing something.
>
> Let’s say PRP1 is aligned to a memory page boundary and the length of
the data
> transfer is more than two (2) memory pages. PRP1 points to the first
memory
> page of data, and PRP2 points to a memory page containing PRP entries;
i.e. a
> PRP list. If the memory page size is 4096 (4KB), then up to 4096 /
(size of PRP
> pointer in bytes) = 4096 / 8 = 512 of PRP entries could be created in
that page.
> Thus, if I follow it correctly, with PRP1 pointing to 4KB of data in the
first memory
> page, and with PRP2 pointing to a 4KB page of PRP entries, we should be
able to
> transfer 1 + 512 = 513 memory pages, and so in this case 513 * 4096 =
2,101,248
> bytes of data. And, that’s only if the implementation of the SPDK NVMe
driver
> elects not to support the mechanism of using the last entry of the page
of PRP
> entries to point to another page of PRP entries.
>
> --
> Lance Hartmann
> lance.hartmann(a)oracle.com
>
>
> > On Aug 8, 2017, at 11:24 PM, Liu, Changpeng <changpeng.liu(a)intel.com>
wrote:
> >
> > Hi Lance,
> >
> > NVME_MAX_XFER_SIZE is the maximum data length supported by SPDK driver,
> of course the NVMe controller has a field(MDTS)
> > to show the limit from hardware, so choose the smaller one as the
command
> limit to split commands bigger than this number.
> >
> > Most of Intel NVMe SSDs has a hardware value 128KiB, so the driver
limit with
> (506*4) KiB is big enough to support it.
> >
> >> -----Original Message-----
> >> From: SPDK [mailto:
[email protected]] On Behalf Of Lance
Hartmann
> >> ORACLE
> >> Sent: Wednesday, August 9, 2017 11:52 AM
> >> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> >> Subject: [SPDK] Determination of NVMe max_io_xfer_size
> >> (NVME_MAX_XFER_SIZE) ?
> >>
> >> Hello,
> >>
> >> I’m trying to reconcile the #define NVME_MAX_XFER_SIZE and leading
> comment:
> >>
> >> /*
> >> * For commands requiring more than 2 PRP entries, one PRP will be
> >> * embedded in the command (prp1), and the rest of the PRP entries
> >> * will be in a list pointed to by the command (prp2). This means
> >> * that real max number of PRP entries we support is 506+1, which
> >> * results in a max xfer size of 506*PAGE_SIZE.
> >> */
> >>
> >> in lib/nvme/nvme_pcie.c with my interpretation from reading the NVMe
spec.
> >> I’d greatly appreciate if someone could “show me the math” or
otherwise
> help
> >> me to understand this. How was NVME_MAX_PRP_LIST_ENTRIES (506)
> derived?
> >> I don’t know if I’m lost in the semantics of the naming, the comment,
or
> perhaps
> >> there’s a nuance in the “…we support…” part. I would’ve guessed,
otherwise,
> >> that the max # of PRP entries would be a function of the PAGE_SIZE.
> >>
> >> I did see that the driver in nvme_ctrlr_identify() compares this
derived
> maximum
> >> transfer size with that which the controller can actually support as
reported in
> >> the Identify Controller structure, choosing the minimum of the two
values,
> but
> >> that’s understood and separate from the above.
> >>
> >> regards,
> >>
> >>
> >> --
> >> Lance Hartmann
> >> lance.hartmann(a)oracle.com
> >>
> >>
> >> _______________________________________________
> >> SPDK mailing list
> >> SPDK(a)lists.01.org
> >>
https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> >
https://lists.01.org/mailman/listinfo/spdk
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk