ACPI 6.1, Table 5-133, updates NVDIMM Control Region Structure as
- Valid Fields, Manufacturing Location, and Manufacturing Date
are added from reserved range. No change in the structure size.
- IDs (SPD values) are stored as arrays of bytes (i.e. big-endian
format). The spec clarifies that they need to be represented
as arrays of bytes as well.
Patch 1 changes the NFIT driver to comply with ACPI 6.1.
Patch 2 adds a new sysfs file "id" to show NVDIMM ID defined in ACPI 6.1.
The patch-set applies on linux-pm.git acpica.
- Need to coordinate with ACPICA update (Bob Moore, Dan Williams)
- Integrate with ACPICA changes in struct acpi_nfit_control_region.
- Remove 'mfg_location' and 'mfg_date'. (Dan Williams)
- Rename 'unique_id' to 'id' and make this change as a separate patch.
Toshi Kani (3):
1/2 acpi/nfit: Update nfit driver to comply with ACPI 6.1
2/3 acpi/nfit: Add sysfs "id" for NVDIMM ID
drivers/acpi/nfit.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
This is certainly not the first time this has been brought up, but I'd like to try and get some consensus on the best way to move this forward. Allowing devices to talk directly improves performance and reduces latency by avoiding the use of staging buffers in system memory. Also in cases where both devices are behind a switch, it avoids the CPU entirely. Most current APIs (DirectGMA, PeerDirect, CUDA, HSA) that deal with this are pointer based. Ideally we'd be able to take a CPU virtual address and be able to get to a physical address taking into account IOMMUs, etc. Having struct pages for the memory would allow it to work more generally and wouldn't require as much explicit support in drivers that wanted to use it.
Some use cases:
1. Storage devices streaming directly to GPU device memory
2. GPU device memory to GPU device memory streaming
3. DVB/V4L/SDI devices streaming directly to GPU device memory
4. DVB/V4L/SDI devices streaming directly to storage devices
Here is a relatively simple example of how this could work for testing. This is obviously not a complete solution.
- Device memory will be registered with Linux memory sub-system by created corresponding struct page structures for device memory
- get_user_pages_fast() will return corresponding struct pages when CPU address points to the device memory
- put_page() will deal with struct pages for device memory
Previously proposed solutions and related proposals:
DMA-API/PCI map_peer_resource support for peer-to-peer (http://www.spinics.net/lists/linux-pci/msg44560.html)
Pros: Low impact, already largely reviewed.
Cons: requires explicit support in all drivers that want to support it, doesn't handle S/G in device memory.
2. ZONE_DEVICE IO
Direct I/O and DMA for persistent memory (https://lwn.net/Articles/672457/)
Add support for ZONE_DEVICE IO memory with struct pages. (https://patchwork.kernel.org/patch/8583221/)
Pro: Doesn't waste system memory for ZONE metadata
Cons: CPU access to ZONE metadata slow, may be lost, corrupted on device reset.
RDMA subsystem DMA-BUF support (http://www.spinics.net/lists/linux-rdma/msg38748.html)
Pros: uses existing dma-buf interface
Cons: dma-buf is handle based, requires explicit dma-buf support in drivers.
iopmem : A block device for PCIe memory (https://lwn.net/Articles/703895/)
Heterogeneous Memory Management (http://lkml.iu.edu/hypermail/linux/kernel/1611.2/02473.html)
6. Some new mmap-like interface that takes a userptr and a length and returns a dma-buf and offset?
A couple weeks back, in the course of reviewing the memcpy_nocache()
proposal from Brian, Linus subtly suggested that the pmem specific
memcpy_to_pmem() routine be moved to be implemented at the driver
"Quite frankly, the whole 'memcpy_nocache()' idea or (ab-)using
copy_user_nocache() just needs to die. It's idiotic.
As you point out, it's also fundamentally buggy crap.
Throw it away. There is no possible way this is ever valid or
portable. We're not going to lie and claim that it is.
If some driver ends up using 'movnt' by hand, that is up to that
*driver*. But no way in hell should we care about this one whit in
the sense of <linux/uaccess.h>."
This feedback also dovetails with another fs/dax.c design wart of being
hard coded to assume the backing device is pmem. We call the pmem
specific copy, clear, and flush routines even if the backing device
driver is one of the other 3 dax drivers (axonram, dccssblk, or brd).
There is no reason to spend cpu cycles flushing the cache after writing
to brd, for example, since it is using volatile memory for storage.
Moreover, the pmem driver might be fronting a volatile memory range
published by the ACPI NFIT, or the platform might have arranged to flush
cpu caches on power fail. This latter capability is a feature that has
appeared in embedded storage appliances ("legacy" / pre-NFIT nvdimm
So, this series:
1/ moves what was previously named "the pmem api" out of the global
namespace and into "that *driver*" (libnvdimm / pmem).
2/ arranges for dax to stop abusing copy_user_nocache() and implements a
libnvdimm-local memcpy that uses movnt
3/ makes cache maintenance optional by arranging for dax to call driver
specific copy and flush operations only if the driver publishes them.
4/ adds a module parameter that can be used to inform libnvdimm of a
platform-level flush-cache-on-power-fail capability.
These patches have a build success notification from the 0day kbuild robot
and pass the libnvdimm / ndctl unit tests. I am looking to take them
through the libnvdimm tree with acks from x86, block, dm etc...
Dan Williams (13):
x86, dax, pmem: remove indirection around memcpy_from_pmem()
block, dax: introduce dax_operations
x86, dax, pmem: introduce 'copy_from_iter' dax operation
dax, pmem: introduce an optional 'flush' dax operation
x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush
x86, dax, libnvdimm: move wb_cache_pmem() to libnvdimm
x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm
x86, libnvdimm, dax: stop abusing __copy_user_nocache
libnvdimm, pmem: implement cache bypass for all copy_from_iter() operations
libnvdimm, pmem: fix persistence warning
libnvdimm, nfit: enable support for volatile ranges
libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
libnvdimm, pmem: disable dax flushing for 'cache flush on fail' platforms
MAINTAINERS | 2
arch/powerpc/sysdev/axonram.c | 6 +
arch/x86/Kconfig | 1
arch/x86/include/asm/pmem.h | 121 ----------------------------
arch/x86/include/asm/string_64.h | 1
drivers/acpi/nfit/core.c | 15 ++-
drivers/block/brd.c | 6 +
drivers/md/dm.c | 6 +
drivers/nvdimm/Kconfig | 5 +
drivers/nvdimm/Makefile | 2
drivers/nvdimm/bus.c | 10 +-
drivers/nvdimm/claim.c | 9 +-
drivers/nvdimm/core.c | 2
drivers/nvdimm/dax_devs.c | 2
drivers/nvdimm/dimm_devs.c | 4 -
drivers/nvdimm/namespace_devs.c | 9 +-
drivers/nvdimm/nd-core.h | 9 ++
drivers/nvdimm/pfn_devs.c | 4 -
drivers/nvdimm/pmem.c | 46 ++++++++---
drivers/nvdimm/pmem.h | 20 +++++
drivers/nvdimm/region_devs.c | 52 ++++++++----
drivers/nvdimm/x86-asm.S | 71 ++++++++++++++++
drivers/nvdimm/x86.c | 84 +++++++++++++++++++
drivers/s390/block/dcssblk.c | 6 +
fs/block_dev.c | 6 +
fs/dax.c | 35 +++++++-
include/linux/blkdev.h | 10 ++
include/linux/libnvdimm.h | 9 ++
include/linux/pmem.h | 165 --------------------------------------
include/linux/string.h | 8 ++
include/linux/uio.h | 4 +
lib/Kconfig | 6 +
lib/iov_iter.c | 25 ++++++
tools/testing/nvdimm/Kbuild | 2
34 files changed, 405 insertions(+), 358 deletions(-)
delete mode 100644 arch/x86/include/asm/pmem.h
create mode 100644 drivers/nvdimm/x86-asm.S
create mode 100644 drivers/nvdimm/x86.c
delete mode 100644 include/linux/pmem.h
Tracepoints are the standard way to capture debugging and tracing
information in many parts of the kernel, including the XFS and ext4
filesystems. This series creates a tracepoint header for FS DAX and add
the first few DAX tracepoints to the PMD fault handler. This allows the
tracing for DAX to be done in the same way as the filesystem tracing so
that developers can look at them together and get a coherent idea of what
the system is doing.
I do intend to add tracepoints to the normal 4k DAX fault path and to the
DAX I/O path, but I wanted to get feedback on the PMD tracepoints before I
went any further.
This series is based on Jan Kara's "dax: Clear dirty bits after flushing
I've pushed a git tree with this work here:
Ross Zwisler (6):
dax: fix build breakage with ext4, dax and !iomap
dax: remove leading space from labels
dax: add tracepoint infrastructure, PMD tracing
dax: update MAINTAINERS entries for FS DAX
dax: add tracepoints to dax_pmd_load_hole()
dax: add tracepoints to dax_pmd_insert_mapping()
MAINTAINERS | 4 +-
fs/Kconfig | 1 +
fs/dax.c | 78 ++++++++++++++----------
fs/ext2/Kconfig | 1 -
include/linux/mm.h | 14 +++++
include/linux/pfn_t.h | 6 ++
include/trace/events/fs_dax.h | 135 ++++++++++++++++++++++++++++++++++++++++++
7 files changed, 206 insertions(+), 33 deletions(-)
create mode 100644 include/trace/events/fs_dax.h
Here's v2 of my chardev cleanup patch-set. I've incorporated some
feedback and decided to extend the concept a little further. The new
helper function now includes both cdev_add and device_add which
significantly simplifies every instance that called it.
Jason's also expressed an interest in creating a general solution to
the problem that occurs if a user tries to utilize a newly created
cdev right before device_add fails. This series doesn't address that
specifically but will make it much easier to do so in future work.
I've also added cdev_set_parent for the cases in IB that set
the kobject parent without using device_add. This is just to ensure
the parent setting code is private within char_dev.c and removes the
lines that appear suspect but are in fact correct. Dan's suggested
WARN_ON is included in this function.
Seeing the new helper function takes in a bit more than before,
the instance patches are a bit heavier. Thus, I've refrained from
collecting the acks and reviews I've already received. In a couple
of cases (mtd and scsi) the cleanup required was a bit more involved
than I would have liked and thus these patches probably need more
attention, review and testing. (Unfortunately, I don't have hardware
to actually test them.) Hopefully that process doesn't throw too big
a wrench in the overall series moving forward.
I've included Dan's cdev_leak patch in this series to avoid a merge
conflict between the two of us.
While the diff stats for the series are much heavier than in v1, we now
have a net loss of more than 100 lines! So this feels much more like a
Our story for this patch-set begins with a new driver I wrote and am in
the process of submitting upstream. That driver creates a fairly
standard char device and the code for it was copied from a similar
instance in device-dax. However, upon review, Greg Kroah-Hartman
noticed  a suspicious line that assigned to the parent field of
the underlying kobject for the char device.
I removed that from my code and endeavoured to remove it from the
code I copied as well. However, Dan Williams pointed out  that this
code is necessary for correct reference counting of the underlying
structures. This prompted me to do a fair bit more research and
investigation into whats going on and I found it to be a common pattern.
(Although misleading and though required to be correct, frequently
forgotten.) This pattern is used in at least 15 places in the kernel
and probably should have been used in at least 5 more.
This patch-set aims to correct this and hopefully prevent future
developers from wasting their time on it. The first patch introduces
a new helper API as originally proposed by Dan Williams . Please
see the commit message for that patch for a longer description of the
problem and history.
The subsequent patches either update correct instances to use the
new API or fix incorrect usages to ensure the cdev correctly takes
a reference count using the new API (this is noted in those patches).
This moves all except four of the cdev.kobj.parent usages into the one
cdev function, the remaining four are in the infiniband subsystem and
I've left alone because that subsystem seems to make use of kobjects
directly (instead of struct devices). These are noted in patch 7 of
This series is based on v4.10 with the exception of the last patch
which is for my new driver which, as yet, has not been accepted
Dan Williams (1):
device-dax: fix cdev leak
Jason Gunthorpe (1):
IB/ucm: utilize new cdev_device_add helper function
Logan Gunthorpe (14):
chardev: add helper function to register char devs with a struct
device-dax: utilize new cdev_device_add helper function
input: utilize new cdev_device_add helper function
gpiolib: utilize new cdev_device_add helper function
tpm-chip: utilize new cdev_device_add helper function
platform/chrome: cros_ec_dev - utilize new cdev_device_add helper
infiniband: utilize the new cdev_set_parent function
iio:core: utilize new cdev_device_add helper function
media: utilize new cdev_device_add helper function
mtd: utilize new cdev_device_add helper function
rapidio: utilize new cdev_device_add helper function
rtc: utilize new cdev_device_add helper function
scsi: utilize new cdev_device_add helper function
switchtec: utilize new device_add_cdev helper function
drivers/char/tpm/tpm-chip.c | 19 ++-----
drivers/dax/dax.c | 33 ++++++------
drivers/gpio/gpiolib.c | 23 +++-----
drivers/iio/industrialio-core.c | 15 ++----
drivers/infiniband/core/ucm.c | 36 +++++++------
drivers/infiniband/core/user_mad.c | 4 +-
drivers/infiniband/core/uverbs_main.c | 2 +-
drivers/infiniband/hw/hfi1/device.c | 2 +-
drivers/input/evdev.c | 11 +---
drivers/input/joydev.c | 11 +---
drivers/input/mousedev.c | 11 +---
drivers/media/cec/cec-core.c | 16 ++----
drivers/media/media-devnode.c | 20 ++-----
drivers/mtd/ubi/build.c | 91 ++++++--------------------------
drivers/mtd/ubi/vmt.c | 49 ++++++-----------
drivers/pci/switch/switchtec.c | 15 ++----
drivers/platform/chrome/cros_ec_dev.c | 31 +++--------
drivers/rapidio/devices/rio_mport_cdev.c | 24 +++------
drivers/rtc/class.c | 14 +++--
drivers/rtc/rtc-dev.c | 17 ------
drivers/scsi/osd/osd_uld.c | 56 +++++++-------------
fs/char_dev.c | 67 +++++++++++++++++++++++
include/linux/cdev.h | 4 ++
23 files changed, 222 insertions(+), 349 deletions(-)
LTP tests on DAX show 2 issues.
msync03 and diotest4, both xfs and ext4,
1, MAP_LOCKED && msync with MS_INVALIDATE, which should fail.
Flag checking code in msync looks ok but missing _LOCK vma falgs
for DAX mapped vma ? i guess DAX now does not support that ?
Tracking by LTP testcase "msync03"
2. O_DIRECT rw odd counts on DAX
read/write 1 byte on file opened with O_DIRECT, EINVAL is
expected but Success.
I'm not sure whether this is an issue, please enlighten :)
Tracking by LTP testcase "dio04 diotest4".
BTW, I am testing DAX with xfstests, LTP and other fs test
cases. If the same case fails on DAX but pass on non-DAX,
i'll look into and report if it is a real issue to me. I've
been doing this for a while, recently, I started looking at
cases that fail on non-DAX and pass on DAX inspired by
Darrick in another thread.
For now, test result looks good. Except the above 2 issues
which I've seen for a while and not sure they are really
issues, xfstests check -g auto has no major regressions
between DAX and non-DAX. generic/403 is a new case and its
failures are under investigation, i'll report if it is.
I think I've covered all the review comments, but do let me know in case
I missed something!
Changes in v2:
- Move checking functionality to a separate file (Dan, Jeff)
- Rename btt-structs.h to check.h (Dan)
- Don't provide a configure option for building the checker, always
build it in. (Dan, Jeff)
- Fix the Documentation example to also include disable-namespace (Linda)
- Update the description text to note the namespace needs to be disabled
before checking (Linda)
- Use util/size.h for sizes (Dan)
- Use --repair to do repairs instead of --dry-run to disable repairs (Dan)
- Fix btt_read_info short read error handling (Jeff)
- Simplify the map lookup/write routines (Jeff)
- Differentiate the use off BTT_PG_SIZE, sysconf(_SC_PAGESIZE), and SZ_4K
(for the fixed start offset) in the different places they're used (Jeff)
- Add the missing msync when copying over info2 (Jeff)
- Add unit tests to test the checker (Jeff)
- Add a missing error case check in do_xaction_namespace for check
- Add a --force option that allows running on an active namespace (Jeff)
- Add a bitmap test for checking all internal blocks are referenced exactly
once between the map and flog (Jeff)
- Remove unused #defines in check.h
- Add comments to explain what we do with raw_mode (Jeff)
- Add some sanity checking when parsing an arena's metadata (Jeff)
- Refactor some read-verify sequences into a helper that combines the two (Jeff)
- Additional bounds checking on the 'offset' in recover_first_sb attempt 3 (Jeff)
- Add a missing ACTION_DESTROY string in parse_namespace_options (Dan)
- Use uXX, and cpu_to_XX from ccan/endian (Dan)
- Move the fletcher64 Routing to util/ as it is shared by builtin-dimm.c (Dan)
- Open the raw block device only once with O_EXCL instead of every time on
- Add a new 'inform' routing in util/usage.c, and use it for some non-critical
- Remove namespace_is_offline() from builtin-check.c. Instead, use
util_namespace_active() from util/json.c
- Add a missing return value check after info block restoration in
Vishal Verma (3):
ndctl: move the fletcher64 routine to util/
ndctl: add a BTT check utility
ndctl, test: Add a unit test for the BTT checker
Documentation/Makefile.am | 1 +
Documentation/ndctl-check-namespace.txt | 64 +++
Documentation/ndctl.txt | 1 +
Makefile.am | 7 +-
builtin.h | 1 +
ccan/bitmap/LICENSE | 1 +
ccan/bitmap/bitmap.c | 125 +++++
ccan/bitmap/bitmap.h | 243 +++++++++
contrib/ndctl | 3 +
licenses/LGPL-2.1 | 508 +++++++++++++++++
ndctl.spec.in | 5 +-
ndctl/Makefile.am | 1 +
ndctl/builtin-check.c | 938 ++++++++++++++++++++++++++++++++
ndctl/builtin-dimm.c | 18 +-
ndctl/builtin-xaction-namespace.c | 66 ++-
ndctl/check.h | 123 +++++
ndctl/ndctl.c | 1 +
test/Makefile.am | 5 +-
test/btt-check.sh | 167 ++++++
util/fletcher.c | 23 +
util/fletcher.h | 8 +
util/usage.c | 15 +
util/util.h | 9 +
23 files changed, 2311 insertions(+), 22 deletions(-)
create mode 100644 Documentation/ndctl-check-namespace.txt
create mode 120000 ccan/bitmap/LICENSE
create mode 100644 ccan/bitmap/bitmap.c
create mode 100644 ccan/bitmap/bitmap.h
create mode 100644 licenses/LGPL-2.1
create mode 100644 ndctl/builtin-check.c
create mode 100644 ndctl/check.h
create mode 100755 test/btt-check.sh
create mode 100644 util/fletcher.c
create mode 100644 util/fletcher.h
Your parcel was successfully delivered February 24 to UPS Station, but our courier cound not contact you.
Postal label is enclosed to this e-mail. Please check the attachment!
Your help is greatly appreciated,
UPS Delivery Manager.
The fix introduced by e4decc90 to fix the UP case for 32bit x86, however
that broke the SMP case that was working previously. Add ifdef so the dummy
function only show up for 32bit UP case only.
Fix: e4decc90 mm,x86: native_pud_clear missing on i386 build
Reported-by: Alexander Kapshuk <alexander.kapshuk(a)gmail.com>
Signed-off-by: Dave Jiang <dave.jiang(a)intel.com>
arch/x86/include/asm/pgtable-3level.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index 50d35e3..8f50fb3 100644
@@ -121,9 +121,11 @@ static inline void native_pmd_clear(pmd_t *pmd)
*(tmp + 1) = 0;
static inline void native_pud_clear(pud_t *pudp)
static inline void pud_clear(pud_t *pudp)