On Tue, 2014-11-04 at 16:26 +0000, Elliott, Robert (Server Storage)
> > -----Original Message-----
> > From: Boaz Harrosh [mailto:[email protected]]
> > Sent: Tuesday, 04 November, 2014 4:38 AM
> > To: Wilcox, Matthew R; Elliott, Robert (Server Storage); Ross
> > Zwisler; Jens Axboe; Nick Piggin; Kani, Toshimitsu; Knippers, Linda;
> > linux-fsdevel(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; linux-
> > nvdimm(a)lists.01.org; Matthew Wilcox
> > Subject: Re: [PATCH 1/4] pmem: Initial version of persistent memory
> > driver
> > On 11/03/2014 06:19 PM, Wilcox, Matthew R wrote:
> > I wish you guys would actually review the correct code.
> > In the actual good driver that has any shape of proper code all these
> > issue are gone.
> > * config defaults gone, multiple-devices multiple-memory ranges fully
> > supported hot plug style.
> > * above shifts cruft completely gone it is left overs from brd.c and
> > its page usage.
> > * getgeo fixed to do what we realy want by the only application on earth
> > that still uses it, fdisk. All other partitioners do not call it at
> > all.
> > Why are we reviewing dead code ?
> > Cheers
> > Boaz
> Ross, what's the status of Boaz' patches (available in
> https://github.com/01org/prd.git doesn't include any of
> them yet.
The UEFI organization is in the process of defining a generic specification
for platform non-volatile memory resources. Essentially the thought was to
wait until that was publicly available before adding any new device discovery
capabilities to pmem.
What Boaz has suggested and coded up is certainly useful, but the worry is
that it will end up being incompatible with what comes out of UEFI. If we
stay with the dead-simple module parameter method, we will have less code to
On 11/03/2014 06:19 PM, Wilcox, Matthew R wrote:
>>> +config BLK_DEV_PMEM_COUNT
>>> + int "Default number of PMEM disks"
>>> + default "4"
>> For real use I think a default of 1 would be better.
> For real use, you need at least two to run xfstests. This whole configuration mechanism is going away soon anyway.
>>> + size_t page_offset = sector >> PAGE_SECTORS_SHIFT;
>>> + size_t offset = page_offset << PAGE_SHIFT;
>> Since page_offset is only used to calculate offset, they
>> could be joined to avoid relying on the compiler to
>> optimize it away:
>> size_t offset = sector >> PAGE_SECTORS_SHIFT << PAGE_SHIFT;
> If you insist on doing the compiler's job for it, why not:
> size_t offset = sector >> (PAGE_SECTORS_SHIFT - PAGE_SHIFT);
> I actually think the original is clearer.
I wish you guys would actually review the correct code.
In the actual good driver that has any shape of proper code all these issue
* config defaults gone, multiple-devices multiple-memory ranges fully supported
hot plug style.
* above shifts cruft completely gone it is left overs from brd.c and
its page usage.
* getgeo fixed to do what we realy want by the only application on earth
that still uses it, fdisk. All other partitioners do not call it at all.
Why are we reviewing dead code ?
"Elliott, Robert (Server Storage)" <Elliott(a)hp.com> writes:
>> +config BLK_DEV_PMEM_COUNT
>> + int "Default number of PMEM disks"
>> + default "4"
> For real use I think a default of 1 would be better.
For "real" use, it will be the number of actual DIMMs, not a config
option, I would think. I don't take any issue with defaulting to 4.
This will go away.
>> + pmem->pmem_queue = blk_alloc_queue(GFP_KERNEL);
>> + if (!pmem->pmem_queue)
>> + goto out_free_dev;
> General comment:
> This driver should be blk-mq based. Not doing so loses
> out on a number of SMP and NUMA performance improvements.
> For example, blk_alloc_queue calls blk_alloc_queue_node
> with NUMA_NO_NODE. If it called blk_mq_init_queue, then
> each CPU would get structures located on its node.
No need to use blk-mq just to set the numa node, as the driver could
just call blk_alloc_queue_node itself. I'd chalk this up to another
thing that could be fixed when the driver is used for actual hardware
that describes its own proximity domain. Further, there is no DMA
engine, here, so what is the benefit of using blk-mq? I/O completes in
the submission path, and I don't see any locking that's preventing
>> + blk_queue_make_request(pmem->pmem_queue, pmem_make_request);
>> + blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
> Many storage controllers support 1 MiB IOs now, and increasing
> amounts of software feel comfortable generating those sizes.
> That means this should be at least 2048.
>> + blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
> It might be appropriate to call blk_queue_rq_timeout
> and set up a very short timeout, since persistent memory is
> very fast. I'm not sure what the default is for non-blk-mq
> drivers; for blk-mq, I think it is 30 seconds, which is
> far too long for memory based storage.
If you timeout on a memcpy, then we've definitely got problems. :)