[PATCH v2 0/5] dax: handling of media errors
by Vishal Verma
Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.
The first three patches from Dan re-enable dax even when media
errors are present.
The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).
One pending item is addressing clear_pmem usages in dax.c. clear_pmem is
'unsafe' as it attempts to simply memcpy, and does not go through the driver.
We have a few options of solving this:
1. Remove all usages of clear_pmem that are not sector-aligned. For the
ones that are aligned, replace them with a bio submission that goes
through the driver to clear errors.
2. Export from the block layer, either an API to zero sub-sector ranges,
or in general, clear errors in a range. The dax attempts to clear_pmem
can then use either of these and not be hit be media errors.
I'll send out a v3 with a crack at option 1, but I wanted to get these
changes (especially the ones in xfs) out for review.
The fifth patch changes all the callers of dax_do_io to check for
EIO, and fallback to direct_IO as needed. This forces the IO to
go through the block driver, and can attempt to clear the error.
v2:
- Use blockdev_issue_zeroout in xfs instead of sb_issue_zeroout (Christoph)
- Un-wrapper-ize dax_do_io and leave the fallback to direct_IO to callers
(Christoph)
- Rebase to v4.6-rc1 (fixup a couple of conflicts in ext4 and xfs)
Dan Williams (3):
block, dax: pass blk_dax_ctl through to drivers
dax: fallback from pmd to pte on error
dax: enable dax in the presence of known media errors (badblocks)
Vishal Verma (2):
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: handle media errors in dax_do_io
arch/powerpc/sysdev/axonram.c | 10 +++++-----
block/ioctl.c | 9 ---------
drivers/block/brd.c | 9 +++++----
drivers/nvdimm/pmem.c | 17 +++++++++++++----
drivers/s390/block/dcssblk.c | 12 ++++++------
fs/block_dev.c | 19 +++++++++++++++----
fs/dax.c | 36 ++----------------------------------
fs/ext2/inode.c | 29 ++++++++++++++++++-----------
fs/ext4/indirect.c | 18 +++++++++++++-----
fs/ext4/inode.c | 21 ++++++++++++++-------
fs/xfs/xfs_aops.c | 14 ++++++++++++--
fs/xfs/xfs_bmap_util.c | 15 ++++-----------
include/linux/blkdev.h | 3 +--
include/linux/dax.h | 1 -
14 files changed, 108 insertions(+), 105 deletions(-)
--
2.5.5
4 years, 10 months
[PATCH v2 00/25] replace ioremap_{cache|wt} with memremap
by Dan Williams
Changes since v1 [1]:
1/ Drop the attempt at unifying ioremap() prototypes, just focus on
converting ioremap_cache and ioremap_wt over to memremap (Christoph)
2/ Drop the unrelated cleanups to use %pa in __ioremap_caller (Thomas)
3/ Add support for memremap() attempts on "System RAM" to simply return
the kernel virtual address for that range. ARM depends on this
functionality in ioremap_cache() and ACPI was open coding a similar
solution. (Mark)
4/ Split the conversions of ioremap_{cache|wt} into separate patches per
driver / arch.
5/ Fix bisection breakage and other reports from 0day-kbuild
---
While developing the pmem driver we noticed that the __iomem annotation
on the return value from ioremap_cache() was being mishandled by several
callers. We also observed that all of the call sites expected to be
able to treat the return value from ioremap_cache() as normal
(non-__iomem) pointer to memory.
This patchset takes the opportunity to clean up the above confusion as
well as a few issues with the ioremap_{cache|wt} interface, including:
1/ Eliminating the possibility of function prototypes differing between
architectures by defining a central memremap() prototype that takes
flags to determine the mapping type.
2/ Returning NULL rather than falling back silently to a different
mapping-type. This allows drivers to be stricter about the
mapping-type fallbacks that are permissible.
[1]: http://marc.info/?l=linux-arm-kernel&m=143735199029255&w=2
---
Dan Williams (22):
mm: enhance region_is_ram() to distinguish 'unknown' vs 'mixed'
arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead
cleanup IORESOURCE_CACHEABLE vs ioremap()
intel_iommu: fix leaked ioremap mapping
arch: introduce memremap()
arm: switch from ioremap_cache to memremap
x86: switch from ioremap_cache to memremap
gma500: switch from acpi_os_ioremap to ioremap
i915: switch from acpi_os_ioremap to ioremap
acpi: switch from ioremap_cache to memremap
toshiba laptop: replace ioremap_cache with ioremap
memconsole: fix __iomem mishandling, switch to memremap
visorbus: switch from ioremap_cache to memremap
intel-iommu: switch from ioremap_cache to memremap
libnvdimm, pmem: switch from ioremap_cache to memremap
pxa2xx-flash: switch from ioremap_cache to memremap
sfi: switch from ioremap_cache to memremap
fbdev: switch from ioremap_wt to memremap
pmem: switch from ioremap_wt to memremap
arch: remove ioremap_cache, replace with arch_memremap
arch: remove ioremap_wt, replace with arch_memremap
pmem: convert to generic memremap
Toshi Kani (3):
mm, x86: Fix warning in ioremap RAM check
mm, x86: Remove region_is_ram() call from ioremap
mm: Fix bugs in region_is_ram()
arch/arc/include/asm/io.h | 1
arch/arm/Kconfig | 1
arch/arm/include/asm/io.h | 13 +++-
arch/arm/include/asm/xen/page.h | 4 +
arch/arm/mach-clps711x/board-cdb89712.c | 2 -
arch/arm/mach-shmobile/pm-rcar.c | 2 -
arch/arm/mm/ioremap.c | 12 +++-
arch/arm/mm/nommu.c | 11 ++-
arch/arm64/Kconfig | 1
arch/arm64/include/asm/acpi.h | 10 +--
arch/arm64/include/asm/dmi.h | 8 +--
arch/arm64/include/asm/io.h | 8 ++-
arch/arm64/kernel/efi.c | 9 ++-
arch/arm64/kernel/smp_spin_table.c | 19 +++---
arch/arm64/mm/ioremap.c | 20 ++----
arch/avr32/include/asm/io.h | 1
arch/frv/Kconfig | 1
arch/frv/include/asm/io.h | 17 ++---
arch/frv/mm/kmap.c | 6 ++
arch/ia64/Kconfig | 1
arch/ia64/include/asm/io.h | 11 +++
arch/ia64/kernel/cyclone.c | 2 -
arch/m32r/include/asm/io.h | 1
arch/m68k/Kconfig | 1
arch/m68k/include/asm/io_mm.h | 14 +---
arch/m68k/include/asm/io_no.h | 12 ++--
arch/m68k/include/asm/raw_io.h | 4 +
arch/m68k/mm/kmap.c | 17 +++++
arch/m68k/mm/sun3kmap.c | 6 ++
arch/metag/include/asm/io.h | 3 -
arch/microblaze/include/asm/io.h | 1
arch/mn10300/include/asm/io.h | 1
arch/nios2/include/asm/io.h | 1
arch/powerpc/kernel/pci_of_scan.c | 2 -
arch/s390/include/asm/io.h | 1
arch/sh/Kconfig | 1
arch/sh/include/asm/io.h | 20 ++++--
arch/sh/mm/ioremap.c | 10 +++
arch/sparc/include/asm/io_32.h | 1
arch/sparc/include/asm/io_64.h | 1
arch/sparc/kernel/pci.c | 3 -
arch/tile/include/asm/io.h | 1
arch/x86/Kconfig | 1
arch/x86/include/asm/efi.h | 3 +
arch/x86/include/asm/io.h | 17 +++--
arch/x86/kernel/crash_dump_64.c | 6 +-
arch/x86/kernel/kdebugfs.c | 8 +--
arch/x86/kernel/ksysfs.c | 28 ++++-----
arch/x86/mm/ioremap.c | 76 ++++++++++--------------
arch/xtensa/Kconfig | 1
arch/xtensa/include/asm/io.h | 9 ++-
drivers/acpi/apei/einj.c | 9 ++-
drivers/acpi/apei/erst.c | 6 +-
drivers/acpi/nvs.c | 6 +-
drivers/acpi/osl.c | 70 ++++++----------------
drivers/char/toshiba.c | 2 -
drivers/firmware/google/memconsole.c | 7 +-
drivers/gpu/drm/gma500/opregion.c | 2 -
drivers/gpu/drm/i915/intel_opregion.c | 2 -
drivers/iommu/intel-iommu.c | 10 ++-
drivers/iommu/intel_irq_remapping.c | 4 +
drivers/isdn/icn/icn.h | 2 -
drivers/mtd/devices/slram.c | 2 -
drivers/mtd/maps/pxa2xx-flash.c | 4 +
drivers/mtd/nand/diskonchip.c | 2 -
drivers/mtd/onenand/generic.c | 2 -
drivers/nvdimm/Kconfig | 2 -
drivers/pci/probe.c | 3 -
drivers/pnp/manager.c | 2 -
drivers/scsi/aic94xx/aic94xx_init.c | 7 --
drivers/scsi/arcmsr/arcmsr_hba.c | 5 --
drivers/scsi/mvsas/mv_init.c | 15 +----
drivers/scsi/sun3x_esp.c | 2 -
drivers/sfi/sfi_core.c | 4 +
drivers/staging/comedi/drivers/ii_pci20kc.c | 1
drivers/staging/unisys/visorbus/visorchannel.c | 16 +++--
drivers/staging/unisys/visorbus/visorchipset.c | 17 +++--
drivers/tty/serial/8250/8250_core.c | 2 -
drivers/video/fbdev/Kconfig | 2 -
drivers/video/fbdev/amifb.c | 5 +-
drivers/video/fbdev/atafb.c | 5 +-
drivers/video/fbdev/hpfb.c | 6 +-
drivers/video/fbdev/ocfb.c | 1
drivers/video/fbdev/s1d13xxxfb.c | 3 -
drivers/video/fbdev/stifb.c | 1
include/acpi/acpi_io.h | 6 +-
include/asm-generic/io.h | 8 ---
include/asm-generic/iomap.h | 4 -
include/linux/io-mapping.h | 2 -
include/linux/io.h | 9 +++
include/linux/mtd/map.h | 2 -
include/linux/pmem.h | 26 +++++---
include/video/vga.h | 2 -
kernel/Makefile | 2 +
kernel/memremap.c | 74 +++++++++++++++++++++++
kernel/resource.c | 43 +++++++-------
lib/Kconfig | 5 +-
lib/devres.c | 13 +---
lib/pci_iomap.c | 7 +-
tools/testing/nvdimm/Kbuild | 4 +
tools/testing/nvdimm/test/iomap.c | 34 ++++++++---
101 files changed, 482 insertions(+), 398 deletions(-)
create mode 100644 kernel/memremap.c
4 years, 10 months
[PATCH v8 00/10] nvdimm: Add an IOCTL pass thru for DSM calls
by Jerry Hoemann
The NVDIMM code in the kernel supports an IOCTL interface to user
space based upon the Intel Example DSM:
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
This interface cannot be used by other NVDIMM DSMs that support
incompatible functions.
An alternative DSM specification for Type N DSM being developed
by Hewlett Packard Enterprise can be found at:
https://github.com/HewlettPackard/hpe-nvm/tree/master/Documentation
To accommodate multiple and conflicting DSM specifications, this patch
set adds a generic "pass-thru" IOCTL interface which is not tied to
a particular DSM.
A new _IOC_NR ND_CMD_CALL == "10" is added for the pass thru call.
The new data structure nd_cmd_pkg serves as a wrapper for the
pass-thru calls. This wrapper supplies the data that the kernel
needs to make the _DSM call.
Unlike the definitions of the _DSM functions themselves, the nd_cmd_pkg
provides the calling information (input/output sizes) in an uniform
manner making the kernel marshaling of the arguments straight
forward.
This shifts the marshaling burden from the kernel to the user
space application while still permitting the kernel to internally
call _DSM functions.
The kernel functions __nd_ioctl and acpi_nfit_ctl were modified
to accommodate ND_CMD_CALL.
Changes in version 8:
---------------------
1. augmented family_to_uuid() to return uuid. This to address bug
in prior version where acpi_nfit_ctl wasn't updating uuid
with value associated with command family.
2. patch 0006 changes name of nvdimm_bus_descriptor.dsm_mask to .cmd_mask
3. patch 0008 adds field cmd_ioctl if kernel supports full ioctl
as with Intel example dsm.
4. patch 0009 make determination if kernel supports the full
cmd_ioctl for that dsm. Updates the commands_show function
to invert the sense of display of commands. All dsm support
pass-thru, only the Intel Example DSM supports the full
ioctl interface.
5. patch 0010 adds explicit ioctl interface to return command mask.
This was done in part to avoid "unknown" command in sysfs.
Changes in version 7:
--------------------
0. change name ND_CMD_CALL_DSM to ND_CMD_CALL
- part of abstracting out DSM missed in version 6.
1. change name in struct nd_call_dsm
a) "ncp_" -> "nd_"
b) ncp_pot_size -> nd_fw_size
c) ncp_type -> nd_family
o) cascade name changes to other patches
2. Expanded comment around data structure nd_cmd_pkg
3. At Dan's request, hard coding "root" UUID.
a) retract extension of dsm_uuid to nvdimm_bus_descriptor.
b) reverted nfit.c/acpi_nfit_init_dsms() with the exception of
allowing function 0 in mask.
4. At Dan's request, removed "rev" from nd_cmd_pkg. Hard-coding
use of rev "1" in acpi_nfit_ctl.
Changes in version 6:
---------------------
Built against
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git
libnvdimm-pending
0. Patches "Clean-up access mode check" and "Fix security issue with DSM IOCTL"
already in above libnvdimm-pending. So omitted here.
1. Incorporated changes from Dan's RFC patch set
https://lists.01.org/pipermail/linux-nvdimm/2016-January/004049.html
2. Dan asked me to abstract out the DSM aspects from the ndm_cmd_dsmcall_pkg.
This became nd_cmd_pkg. UUIDs are no longer passed in from
user applications.
3. To accommodate multiple UUIDS, added table cmd_type_tbl which is used
to determine UUID for the acpi object by calling function 0 for
each UUID in table until success.
This table also provides a MASK field that the kernel can use
to exclude functions being called.
This table can be thought of a list of "acceptable" DSMs.
4. The cmd_type_tbl is also used by acpi_nfit_ctl to map the
external handle of calls to internal handle, UUID.
Note, code only validates that the requested type of call is one in
cmd_type_tbl, but it might not necessarily be the same found during
acpi_nfit_add_dimm. The ACPI SPEC appears to allow and firmware
does implement multiple UUID per object.
In the case where type is in table, but the UUID isn't supported
by the underlying firmware, firmware shall return an error when
called.
This allows for use of a secondary DSM on an object. This could
be considered a feature or a defect. This can be tightened
up if needed.
Changes in version 5:
---------------------
0. Fixed submit comment for drivers/acpi/utils.c.
Changes in version 4:
---------------------
0. Added patch to correct parameter type passed to acpi_evaluate_dsm
ACPI defines arguments rev and fun as 64 bit quantities and the ioctl
exports to user face rev and func. We want those to match the ACPI spec.
Also modified acpi_evaluate_dsm_typed and acpi_check dsm which had
similar issue.
1. nd_cmd_dsmcall_pkg rearrange a reserve and rounded up total size
to 16 byte boundary.
2. Created stand alone patch for the pre-existing security issue related
to "read only" IOCTL calls.
3. Added patch for increasing envelope size of IOCTL. Needed to
be able to read in the wrapper to know remaining size to copy in.
Note: in_env, out_env are statics sized based upon this change.
4. Moved copyin code to table driven nd_cmd_desc
Note, the last 40 lines or so of acpi_nfit_ctl will not return _DSM
data unless the size allocated in user space buffer equals
out_obj->buffer.length.
The semantic we want in the pass thru case is to return as much
of the _DSM data as the user space buffer would accommodate.
Hence, in acpi_nfit_ctl I have retained the line:
memcpy(pkg->dsm_buf + pkg->h.dsm_in,
out_obj->buffer.pointer,
min(pkg->h.dsm_size, pkg->h.dsm_out));
and the early return from the function.
Changes in version 3:
---------------------
1. Changed name ND_CMD_PASSTHRU to ND_CMD_CALL_DSM.
2. Value of ND_CMD_CALL_DSM is 10, not 100.
3. Changed name of nd_passthru_pkg to nd_cmd_dsmcall_pkg.
4. Removed separate functions for handling ND_CMD_CALL_DSM.
Moved functionality to __nd_ioctl and acpi_nfit_ctl proper.
The resultant code looks very different from prior versions.
5. BUGFIX: __nd_ioctl: Change the if read_only switch to use
_IOC_NR cmd (not ioctl_cmd) for better protection.
Do we want to make a stand alone patch for this issue?
Changes in version 2:
---------------------
1. Cleanup access mode check in nd_ioctl and nvdimm_ioctl.
2. Change name of ndn_pkg to nd_passthru_pkg
3. Adjust sizes in nd_passthru_pkg. DSM integers are 64 bit.
4. No new ioctl type, instead tunnel into the existing number space.
5. Push down one function level where determine ioctl cmd type.
6. re-work diagnostic print/dump message in pass-thru functions.
Jerry Hoemann (10):
ACPI / util: Fix acpi_evaluate_dsm() argument type
nvdimm: Add wrapper for IOCTL pass thru
nvdimm: Increase max envelope size for IOCTL
nvdimm: Add UUIDs
nvdimm: Add IOCTL pass thru functions
libnvdimm: nvdimm_bus_descriptor field name change
tools/testing/nvdimm: 'call_dsm' support
nvdimm: command ioctl support
nvdimm: sysfs shows which dsm support full command ioctl.
nvdimm: Add ioctl to return command mask.
drivers/acpi/nfit.c | 154 +++++++++++++++++++++++++++++++++------
drivers/acpi/nfit.h | 6 ++
drivers/acpi/utils.c | 4 +-
drivers/nvdimm/bus.c | 55 +++++++++++++-
drivers/nvdimm/core.c | 3 +-
drivers/nvdimm/dimm_devs.c | 12 ++-
drivers/nvdimm/nd-core.h | 1 +
include/acpi/acpi_bus.h | 6 +-
include/linux/libnvdimm.h | 6 +-
include/uapi/linux/ndctl.h | 62 ++++++++++++++++
tools/testing/nvdimm/test/nfit.c | 15 +++-
11 files changed, 283 insertions(+), 41 deletions(-)
--
1.7.11.3
4 years, 10 months
[PATCH v1 00/10] uuid: convert users to generic UUID API
by Andy Shevchenko
There are few fumctions here and there along with type definitions that provide
UUID API. This series consolidates everything under one hood and converts
current users.
This has been tested for a while internally, however it doesn't mean we covered
all possible cases (especially accuracy of UUID constants after conversion).
So, please test this as much as you can and provide your tag. We appreciate the
effort.
Andy Shevchenko (10):
lib/vsprintf: simplify UUID printing
lib/uuid: move generate_random_uuid() to uuid.c
lib/uuid: introduce few more generic helpers for UUID
lib/uuid: remove FSF address
ACPI: switch to use generic UUID API
device property: switch to use UUID API
sysctl: drop away useless label
sysctl: use generic UUID library
efi: redefine type, constant, macro from generic code
efivars: use generic UUID library
drivers/acpi/acpi_extlog.c | 8 +-
drivers/acpi/bus.c | 29 +------
drivers/acpi/nfit.c | 34 ++++----
drivers/acpi/nfit.h | 3 +-
drivers/acpi/property.c | 18 ++---
drivers/acpi/utils.c | 4 +-
drivers/char/random.c | 21 +----
drivers/char/tpm/tpm_crb.c | 9 +--
drivers/char/tpm/tpm_ppi.c | 20 ++---
drivers/gpu/drm/i915/intel_acpi.c | 14 ++--
drivers/gpu/drm/nouveau/nouveau_acpi.c | 20 +++--
drivers/gpu/drm/nouveau/nvkm/subdev/mxm/base.c | 9 +--
drivers/hid/i2c-hid/i2c-hid.c | 9 +--
drivers/iommu/dmar.c | 11 ++-
drivers/pci/pci-acpi.c | 11 ++-
drivers/pci/pci-label.c | 4 +-
drivers/thermal/int340x_thermal/int3400_thermal.c | 6 +-
drivers/usb/host/xhci-pci.c | 9 +--
fs/btrfs/volumes.c | 2 +-
fs/efivarfs/inode.c | 40 +---------
fs/ext4/ioctl.c | 1 +
fs/f2fs/file.c | 2 +-
fs/reiserfs/objectid.c | 2 +-
fs/ubifs/sb.c | 2 +-
include/acpi/acpi_bus.h | 10 ++-
include/linux/acpi.h | 2 +-
include/linux/efi.h | 14 +---
include/linux/pci-acpi.h | 2 +-
include/linux/random.h | 1 -
include/linux/uuid.h | 21 +++--
include/uapi/linux/uuid.h | 4 -
kernel/sysctl_binary.c | 30 +++----
lib/uuid.c | 96 +++++++++++++++++++++--
lib/vsprintf.c | 21 ++---
sound/soc/intel/skylake/skl-nhlt.c | 7 +-
35 files changed, 237 insertions(+), 259 deletions(-)
--
2.7.0
4 years, 11 months
[PATCH 0/5] dax: handling of media errors
by Vishal Verma
Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.
The first three patches from Dan re-enable dax even when media
errors are present.
The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).
The fifth patch changes the behaviour of dax_do_io by adding a
wrapper around it that is passed all the arguments also needed by
__blockdev_do_direct_IO. If (the new) __dax_do_io fails with -EIO
due to a bad block, we simply retry with the direct_IO path which
forces the IO to go through the block driver, and can attempt to
clear the error.
Dan Williams (3):
block, dax: pass blk_dax_ctl through to drivers
dax: fallback from pmd to pte on error
dax: enable dax in the presence of known media errors (badblocks)
Vishal Verma (2):
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: handle media errors in dax_do_io
arch/powerpc/sysdev/axonram.c | 10 +++----
block/ioctl.c | 9 ------
drivers/block/brd.c | 9 +++---
drivers/nvdimm/pmem.c | 17 ++++++++---
drivers/s390/block/dcssblk.c | 12 ++++----
fs/block_dev.c | 7 +++--
fs/dax.c | 70 +++++++++++++++++++++----------------------
fs/ext2/inode.c | 12 ++++----
fs/ext4/indirect.c | 11 ++++---
fs/ext4/inode.c | 5 ++--
fs/xfs/xfs_aops.c | 7 +++--
fs/xfs/xfs_bmap_util.c | 9 ------
include/linux/blkdev.h | 3 +-
include/linux/dax.h | 7 +++--
14 files changed, 93 insertions(+), 95 deletions(-)
--
2.5.5
4 years, 11 months
[RFC v2] [PATCH 0/10] DAX page fault locking
by Jan Kara
[Sorry for repost but I accidentally sent initial email without patches]
Hello,
this is my second attempt at DAX page fault locking rewrite. Things now
work reasonably well, it has survived full xfstests run on ext4. I guess
I need to do more mmap targetted tests to unveil issues. Guys what do you
used for DAX testing?
Changes since v1:
- handle wakeups of exclusive waiters properly
- fix cow fault races
- other minor stuff
General description
The basic idea is that we use a bit in an exceptional radix tree entry as
a lock bit and use it similarly to how page lock is used for normal faults.
That way we fix races between hole instantiation and read faults of the
same index. For now I have disabled PMD faults since there the issues with
page fault locking are even worse. Now that Matthew's multi-order radix tree
has landed, I can have a look into using that for proper locking of PMD faults
but first I want normal pages sorted out.
In the end I have decided to implement the bit locking directly in the DAX
code. Originally I was thinking we could provide something generic directly
in the radix tree code but the functions DAX needs are rather specific.
Maybe someone else will have a good idea how to distill some generally useful
functions out of what I've implemented for DAX but for now I didn't bother
with that.
Honza
4 years, 11 months
Re: [PATCH 12/12] dax: New fault locking
by Jan Kara
On Thu 31-03-16 15:20:00, NeilBrown wrote:
> On Wed, Mar 23 2016, Jan Kara wrote:
>
> > On Wed 23-03-16 08:10:42, NeilBrown wrote:
> >> On Sat, Mar 19 2016, Jan Kara wrote:
> >> >
> >> > Actually, after some thought I don't think the wakeup is needed except for
> >> > dax_pfn_mkwrite(). In the other cases we know there is no radix tree
> >> > exceptional entry and thus there can be no waiters for its lock...
> >> >
> >>
> >> I think that is fragile logic - though it may be correct at present.
> >>
> >> A radix tree slot can transition from "Locked exception" to "unlocked
> >> exception" to "deleted" to "struct page".
> >
> > Yes.
> >
> >> So it is absolutely certain that a thread cannot go to sleep after
> >> finding a "locked exception" and wake up to find a "struct page" ??
> >
> > With current implementation this should not happen but I agree entry
> > locking code should not rely on this.
> >
> >> How about a much simpler change.
> >> - new local variable "slept" in lookup_unlocked_mapping_entry() which
> >> is set if prepare_to_wait_exclusive() gets called.
> >> - if after __radix_tree_lookup() returns:
> >> (ret==NULL || !radix_tree_exceptional_entry(ret)) && slept
> >> then it calls wakeup immediately - because if it was waiting,
> >> something else might be to.
> >>
> >> That would cover all vaguely possible cases except dax_pfn_mkwrite()
> >
> > But how does this really help? If lookup_unlocked_mapping_entry() finds
> > there is no entry (and it was there before), the process deleting the entry
> > (or replacing it with something else) is responsible for waking up
> > everybody.
>
> "everybody" - yes. But it doesn't wake everybody does it? It just
> wakes one.
>
> + __wake_up(wq, TASK_NORMAL, 1, &key);
> ^one!
>
> Or am I misunderstanding how exclusive waiting works?
Ah, OK. I have already fixed that in my local version of the patches so
that we wake-up everybody after deleting the entry but forgot to tell you.
So I have there now:
__wake_up(wq, TASK_NORMAL, 0, &key);
Are you OK with the code now?
Honza
--
Jan Kara <jack(a)suse.com>
SUSE Labs, CR
4 years, 11 months
[ndctl PATCH] ndctl: SUSE spec file fixups
by Dan Williams
Per recent build.opensuse.org results:
1/ The SLE_12 target still mandates use of %defattr
2/ The SUSE-rpmlint system now enforces license names from
https://license.opensuse.org
3/ The libndctl version suffix needs to account for LIBNDCTL_AGE, not
just LIBNDCTL_CURRENT.
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
Makefile.am | 6 ++++--
ndctl.spec.in | 3 +++
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/Makefile.am b/Makefile.am
index 3f7dca3d37e8..7de3d3c3c9eb 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -56,12 +56,14 @@ CLEANFILES += $(noinst_SCRIPTS)
do_rhel_subst = sed -e 's,VERSION,$(VERSION),g' \
-e 's,DNAME,ndctl-devel,g' \
- -e 's,LNAME,ndctl-libs,g'
+ -e '/^%defattr.*/d' \
+ -e 's,LNAME,ndctl-libs,g'
do_sles_subst = sed -e 's,VERSION,$(VERSION),g' \
-e 's,DNAME,libndctl-devel,g' \
-e 's,%license,%doc,g' \
- -e 's,LNAME,libndctl$(LIBNDCTL_CURRENT),g'
+ -e 's,\(^License:.*GPL\)v2,\1-2.0,g' \
+ -e "s,LNAME,libndctl$$(($(LIBNDCTL_CURRENT) - $(LIBNDCTL_AGE))),g"
rhel/ndctl.spec: ndctl.spec.in Makefile.am
$(AM_V_GEN)$(MKDIR_P) rhel; $(do_rhel_subst) < $< > $@
diff --git a/ndctl.spec.in b/ndctl.spec.in
index 6f6e99bd743b..e24b31cc54f0 100644
--- a/ndctl.spec.in
+++ b/ndctl.spec.in
@@ -65,16 +65,19 @@ make check
%postun -n LNAME -p /sbin/ldconfig
%files
+%defattr(-,root,root)
%license licenses/GPLv2 licenses/BSD-MIT licenses/CC0
%{_bindir}/ndctl
%{_mandir}/man1/*
%files -n LNAME
+%defattr(-,root,root)
%doc README.md
%license COPYING licenses/BSD-MIT licenses/CC0
%{_libdir}/libndctl.so.*
%files -n DNAME
+%defattr(-,root,root)
%license COPYING
%{_includedir}/ndctl/
%{_libdir}/libndctl.so
4 years, 11 months
[GIT PULL v2] libnvdimm, pmem: hook up memcpy_mcsafe
by Williams, Dan J
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-for-next
Now that mcsafe_memcpy() has landed, and the return value was been
clarified in commit cbf8b5a2b649 "x86/mm, x86/mce: Fix return
type/value for memcpy_mcsafe()", let's hook up its primary usage in the
pmem driver.
The compilation problems from the initial posting have been fixed, this
has appeared in a -next release with no reported issues, and it picked
up an ack from Ingo. There is no pressing need to merge this in 4.6-
rc2. However, if we wait until 4.7 the new memcpy_mcsafe() capability
will ship without a user in 4.6-final.
---
The following changes since commit f55532a0c0b8bb6148f4e07853b876ef73bc69ca:
Linux 4.6-rc1 (2016-03-26 16:03:24 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-for-next
for you to fetch changes up to fc0c2028135c7f75fce36b90e44efb8003a9173b:
x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem() (2016-03-28 17:19:31 -0700)
----------------------------------------------------------------
Dan Williams (1):
x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem()
commit fc0c2028135c7f75fce36b90e44efb8003a9173b
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue Mar 8 10:30:19 2016 -0800
x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem()
Update the definition of memcpy_from_pmem() to return 0 or a negative
error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Reviewed-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
arch/x86/include/asm/pmem.h | 9 +++++++++
drivers/nvdimm/pmem.c | 4 ++--
include/linux/pmem.h | 22 ++++++++++++++++------
3 files changed, 27 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index bf8b35d2035a..fbc5e92e1ecc 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -47,6 +47,15 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
BUG();
}
+static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
+ size_t n)
+{
+ if (static_cpu_has(X86_FEATURE_MCE_RECOVERY))
+ return memcpy_mcsafe(dst, (void __force *) src, n);
+ memcpy(dst, (void __force *) src, n);
+ return 0;
+}
+
/**
* arch_wmb_pmem - synchronize writes to persistent memory
*
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ca5721c306bb..cc31c6f1f88e 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -99,7 +99,7 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct page *page,
if (unlikely(bad_pmem))
rc = -EIO;
else {
- memcpy_from_pmem(mem + off, pmem_addr, len);
+ rc = memcpy_from_pmem(mem + off, pmem_addr, len);
flush_dcache_page(page);
}
} else {
@@ -295,7 +295,7 @@ static int pmem_rw_bytes(struct nd_namespace_common *ndns,
if (unlikely(is_bad_pmem(&pmem->bb, offset / 512, sz_align)))
return -EIO;
- memcpy_from_pmem(buf, pmem->virt_addr + offset, size);
+ return memcpy_from_pmem(buf, pmem->virt_addr + offset, size);
} else {
memcpy_to_pmem(pmem->virt_addr + offset, buf, size);
wmb_pmem();
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 3ec5309e29f3..ac6d872ce067 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -42,6 +42,13 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
BUG();
}
+static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
+ size_t n)
+{
+ BUG();
+ return -EFAULT;
+}
+
static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
struct iov_iter *i)
{
@@ -66,14 +73,17 @@ static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
#endif
/*
- * Architectures that define ARCH_HAS_PMEM_API must provide
- * implementations for arch_memcpy_to_pmem(), arch_wmb_pmem(),
- * arch_copy_from_iter_pmem(), arch_clear_pmem(), arch_wb_cache_pmem()
- * and arch_has_wmb_pmem().
+ * memcpy_from_pmem - read from persistent memory with error handling
+ * @dst: destination buffer
+ * @src: source buffer
+ * @size: transfer length
+ *
+ * Returns 0 on success negative error code on failure.
*/
-static inline void memcpy_from_pmem(void *dst, void __pmem const *src, size_t size)
+static inline int memcpy_from_pmem(void *dst, void __pmem const *src,
+ size_t size)
{
- memcpy(dst, (void __force const *) src, size);
+ return arch_memcpy_from_pmem(dst, src, size);
}
static inline bool arch_has_pmem_api(void)
4 years, 11 months
[PATCH v4 0/8] Support for transparent PUD pages for DAX files
by Matthew Wilcox
We have customer demand to use 1GB pages to map DAX files. Unlike the 2MB
page support, the Linux MM does not currently support PUD pages, so I have
attempted to add support for the necessary pieces for DAX huge PUD pages.
Filesystems still need work to allocate 1GB pages. With ext4, I can
only get 16MB of contiguous space, although it is aligned. With XFS,
I can get 80MB less than 1GB, and it's not aligned. The XFS problem
may be due to the small amount of RAM in my test machine.
This patch set is against something approximately current -mm. I'd like
to thank Dave Chinner & Kirill Shutemov for their reviews of v1.
The conversion of pmd_fault & pud_fault to huge_fault is thanks to
Dave's poking, and Kirill spotted a couple of problems in the MM code.
Version 2 of the patch set is about 200 lines smaller (1016 insertions,
23 deletions in v1).
I've done some light testing using a program to mmap a block device
with DAX enabled, calling mincore() and examining /proc/smaps and
/proc/pagemap.
v4: Updated to current mmotm
Converted pud_trans_huge_lock to the same calling conventions as
pmd_trans_huge_lock.
Fill in vm_fault ->gfp_flags and ->pgoff, at Jan Kara's suggestion
Replace use of page table lock with pud_lock in __pud_alloc (cosmetic)
Fix compilation problems with various config settings
Convert dax_pmd_fault and dax_pud_fault to take a vm_fault instead of
individual pieces
Add copy_huge_pud() and follow_devmap_pud() so fork() should now work
Fix typo of PMD for PUD
v3: Rebased against current mmtom
v2: Reduced churn in filesystems by switching to ->huge_fault interface
Addressed concerns from Kirill
Matthew Wilcox (8):
mm: Convert an open-coded VM_BUG_ON_VMA
mm,fs,dax: Change ->pmd_fault to ->huge_fault
mm: Add support for PUD-sized transparent hugepages
mincore: Add support for PUDs
procfs: Add support for PUDs to smaps, clear_refs and pagemap
x86: Add support for PUD-sized transparent hugepages
dax: Support for transparent PUD pages
ext4: Support for PUD-sized transparent huge pages
Documentation/filesystems/dax.txt | 12 +-
arch/Kconfig | 3 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/paravirt.h | 11 ++
arch/x86/include/asm/paravirt_types.h | 2 +
arch/x86/include/asm/pgtable-2level.h | 19 +++
arch/x86/include/asm/pgtable-3level.h | 31 ++++
arch/x86/include/asm/pgtable.h | 134 +++++++++++++++
arch/x86/include/asm/pgtable_64.h | 13 ++
arch/x86/kernel/paravirt.c | 1 +
arch/x86/mm/pgtable.c | 31 ++++
fs/block_dev.c | 10 +-
fs/dax.c | 295 +++++++++++++++++++++++++---------
fs/ext2/file.c | 27 +---
fs/ext4/file.c | 60 +++----
fs/proc/task_mmu.c | 109 +++++++++++++
fs/xfs/xfs_file.c | 25 ++-
fs/xfs/xfs_trace.h | 2 +-
include/asm-generic/pgtable.h | 74 ++++++++-
include/asm-generic/tlb.h | 14 ++
include/linux/dax.h | 17 --
include/linux/huge_mm.h | 78 ++++++++-
include/linux/mm.h | 48 +++++-
include/linux/mmu_notifier.h | 14 ++
include/linux/pfn_t.h | 8 +
mm/gup.c | 7 +
mm/huge_memory.c | 246 ++++++++++++++++++++++++++++
mm/memory.c | 135 ++++++++++++++--
mm/mincore.c | 13 ++
mm/pagewalk.c | 19 ++-
mm/pgtable-generic.c | 14 ++
31 files changed, 1261 insertions(+), 212 deletions(-)
--
2.7.0.rc3
4 years, 11 months