[LSF/MM ATTEND] blk-mq polling, nvme, pmem (for iomem) and non-block based SSDs.
by Stephen Bates
Hi
I would like to attend LSF/MM 2016 to participate in discussions around the optimization of the block layer and memory management for low latency NVM technologies. I'd be very interested in discussions pertaining to where we can take the work being done to add polling into the block layer and tying that into file-systems and applications. I would also be keen to discuss how we might extend the recent work to facilitate large persistent memory regions to IO memory (e.g. PCIe devices with large, persistent, memory regions).
I am also keen to discuss topics associated with non-block based hardware devices including things like NVDIMM, OpenChannel SSDs and persistent memory exposed on PCIe devices.
I spend quite a bit of time working on elements of the block layer (especially NVMe), the RDMA stack and (more recently) the NVDIMM/PMEM/DAX sections of the kernel.
Cheers
Stephen
4 years, 11 months
[PATCH v3 0/8] Support for transparent PUD pages for DAX files
by Matthew Wilcox
From: Matthew Wilcox <willy(a)linux.intel.com>
Andrew, I think this is ready for a spin in -mm.
v3: Rebased against current mmtom
v2: Reduced churn in filesystems by switching to ->huge_fault interface
Addressed concerns from Kirill
We have customer demand to use 1GB pages to map DAX files. Unlike the 2MB
page support, the Linux MM does not currently support PUD pages, so I have
attempted to add support for the necessary pieces for DAX huge PUD pages.
Filesystems still need work to allocate 1GB pages. With ext4, I can
only get 16MB of contiguous space, although it is aligned. With XFS,
I can get 80MB less than 1GB, and it's not aligned. The XFS problem
may be due to the small amount of RAM in my test machine.
This patch set is against something approximately current -mm. I'd like
to thank Dave Chinner & Kirill Shutemov for their reviews of v1.
The conversion of pmd_fault & pud_fault to huge_fault is thanks to
Dave's poking, and Kirill spotted a couple of problems in the MM code.
Version 2 of the patch set is about 200 lines smaller (1016 insertions,
23 deletions in v1).
I've done some light testing using a program to mmap a block device
with DAX enabled, calling mincore() and examining /proc/smaps and
/proc/pagemap.
Matthew Wilcox (8):
mm: Convert an open-coded VM_BUG_ON_VMA
mm,fs,dax: Change ->pmd_fault to ->huge_fault
mm: Add support for PUD-sized transparent hugepages
mincore: Add support for PUDs
procfs: Add support for PUDs to smaps, clear_refs and pagemap
x86: Add support for PUD-sized transparent hugepages
dax: Support for transparent PUD pages
ext4: Support for PUD-sized transparent huge pages
Documentation/filesystems/dax.txt | 12 +-
arch/Kconfig | 3 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/paravirt.h | 11 ++
arch/x86/include/asm/paravirt_types.h | 2 +
arch/x86/include/asm/pgtable.h | 94 ++++++++++++
arch/x86/include/asm/pgtable_64.h | 13 ++
arch/x86/kernel/paravirt.c | 1 +
arch/x86/mm/pgtable.c | 31 ++++
fs/block_dev.c | 10 +-
fs/dax.c | 272 +++++++++++++++++++++++++---------
fs/ext2/file.c | 27 +---
fs/ext4/file.c | 60 +++-----
fs/proc/task_mmu.c | 109 ++++++++++++++
fs/xfs/xfs_file.c | 25 ++--
fs/xfs/xfs_trace.h | 2 +-
include/asm-generic/pgtable.h | 62 +++++++-
include/asm-generic/tlb.h | 14 ++
include/linux/dax.h | 17 ---
include/linux/huge_mm.h | 50 +++++++
include/linux/mm.h | 43 +++++-
include/linux/mmu_notifier.h | 13 ++
include/linux/pfn_t.h | 8 +
mm/huge_memory.c | 151 +++++++++++++++++++
mm/memory.c | 101 +++++++++++--
mm/mincore.c | 13 ++
mm/pagewalk.c | 19 ++-
mm/pgtable-generic.c | 14 ++
28 files changed, 980 insertions(+), 198 deletions(-)
--
2.6.4
4 years, 11 months
[RFC PATCH] mm: support CONFIG_ZONE_DEVICE + CONFIG_ZONE_DMA
by Dan Williams
It appears devices requiring ZONE_DMA are still prevalent (see link
below). For this reason the proposal to require turning off ZONE_DMA to
enable ZONE_DEVICE is untenable in the short term. We want a single
kernel image to be able to support legacy devices as well as next
generation persistent memory platforms.
Towards this end, alias ZONE_DMA and ZONE_DEVICE to work around needing
to maintain a unique zone number for ZONE_DEVICE. Record the geometry
of ZONE_DMA at init (->init_spanned_pages) and use that information in
is_zone_device_page() to differentiate pages allocated via
devm_memremap_pages() vs true ZONE_DMA pages. Otherwise, use the
simpler definition of is_zone_device_page() when ZONE_DMA is turned off.
Note that this also teaches the memory hot remove path that the zone may
not have sections for all pfn spans (->zone_dyn_start_pfn).
A user visible implication of this change is potentially an unexpectedly
high "spanned" value in /proc/zoneinfo for the DMA zone.
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Jerome Glisse <j.glisse(a)gmail.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931
Fixes: 033fbae988fc ("mm: ZONE_DEVICE for "device memory"")
Reported-by: Sudip Mukherjee <sudipm.mukherjee(a)gmail.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++--------------
include/linux/mmzone.h | 24 ++++++++++++++++++++----
mm/Kconfig | 1 -
mm/memory_hotplug.c | 15 +++++++++++----
mm/page_alloc.c | 9 ++++++---
5 files changed, 69 insertions(+), 26 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f1cd22f2df1a..b4bccd3d3c41 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -664,12 +664,44 @@ static inline enum zone_type page_zonenum(const struct page *page)
return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
}
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+extern int page_to_nid(const struct page *page);
+#else
+static inline int page_to_nid(const struct page *page)
+{
+ return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
+}
+#endif
+
+static inline struct zone *page_zone(const struct page *page)
+{
+ return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
+}
+
#ifdef CONFIG_ZONE_DEVICE
void get_zone_device_page(struct page *page);
void put_zone_device_page(struct page *page);
static inline bool is_zone_device_page(const struct page *page)
{
+#ifndef CONFIG_ZONE_DMA
return page_zonenum(page) == ZONE_DEVICE;
+#else /* ZONE_DEVICE == ZONE_DMA */
+ struct zone *zone;
+
+ if (page_zonenum(page) != ZONE_DEVICE)
+ return false;
+
+ /*
+ * If ZONE_DEVICE is aliased with ZONE_DMA we need to check
+ * whether this was a dynamically allocated page from
+ * devm_memremap_pages() by checking against the size of
+ * ZONE_DMA at boot.
+ */
+ zone = page_zone(page);
+ if (page_to_pfn(page) <= zone_end_pfn_boot(zone))
+ return false;
+ return true;
+#endif
}
#else
static inline void get_zone_device_page(struct page *page)
@@ -735,15 +767,6 @@ static inline int zone_to_nid(struct zone *zone)
#endif
}
-#ifdef NODE_NOT_IN_PAGE_FLAGS
-extern int page_to_nid(const struct page *page);
-#else
-static inline int page_to_nid(const struct page *page)
-{
- return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
-}
-#endif
-
#ifdef CONFIG_NUMA_BALANCING
static inline int cpu_pid_to_cpupid(int cpu, int pid)
{
@@ -857,11 +880,6 @@ static inline bool cpupid_match_pid(struct task_struct *task, int cpupid)
}
#endif /* CONFIG_NUMA_BALANCING */
-static inline struct zone *page_zone(const struct page *page)
-{
- return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
-}
-
#ifdef SECTION_IN_PAGE_FLAGS
static inline void set_page_section(struct page *page, unsigned long section)
{
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 33bb1b19273e..a0ef09b7f893 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -288,6 +288,13 @@ enum zone_type {
*/
ZONE_DMA,
#endif
+#ifdef CONFIG_ZONE_DEVICE
+#ifndef CONFIG_ZONE_DMA
+ ZONE_DEVICE,
+#else
+ ZONE_DEVICE = ZONE_DMA,
+#endif
+#endif
#ifdef CONFIG_ZONE_DMA32
/*
* x86_64 needs two ZONE_DMAs because it supports devices that are
@@ -314,11 +321,7 @@ enum zone_type {
ZONE_HIGHMEM,
#endif
ZONE_MOVABLE,
-#ifdef CONFIG_ZONE_DEVICE
- ZONE_DEVICE,
-#endif
__MAX_NR_ZONES
-
};
#ifndef __GENERATING_BOUNDS_H
@@ -379,12 +382,19 @@ struct zone {
/* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */
unsigned long zone_start_pfn;
+ /* first dynamically added pfn of the zone */
+ unsigned long zone_dyn_start_pfn;
/*
* spanned_pages is the total pages spanned by the zone, including
* holes, which is calculated as:
* spanned_pages = zone_end_pfn - zone_start_pfn;
*
+ * init_spanned_pages is the boot/init time total pages spanned
+ * by the zone for differentiating statically assigned vs
+ * dynamically hot added memory to a zone.
+ * init_spanned_pages = init_zone_end_pfn - zone_start_pfn;
+ *
* present_pages is physical pages existing within the zone, which
* is calculated as:
* present_pages = spanned_pages - absent_pages(pages in holes);
@@ -423,6 +433,7 @@ struct zone {
*/
unsigned long managed_pages;
unsigned long spanned_pages;
+ unsigned long init_spanned_pages;
unsigned long present_pages;
const char *name;
@@ -546,6 +557,11 @@ static inline unsigned long zone_end_pfn(const struct zone *zone)
return zone->zone_start_pfn + zone->spanned_pages;
}
+static inline unsigned long zone_end_pfn_boot(const struct zone *zone)
+{
+ return zone->zone_start_pfn + zone->init_spanned_pages;
+}
+
static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
{
return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
diff --git a/mm/Kconfig b/mm/Kconfig
index 97a4e06b15c0..08a92a9c8fbd 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -652,7 +652,6 @@ config IDLE_PAGE_TRACKING
config ZONE_DEVICE
bool "Device memory (pmem, etc...) hotplug support" if EXPERT
default !ZONE_DMA
- depends on !ZONE_DMA
depends on MEMORY_HOTPLUG
depends on MEMORY_HOTREMOVE
depends on X86_64 #arch_add_memory() comprehends device memory
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4af58a3a8ffa..c3f0ff45bd47 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -300,6 +300,8 @@ static void __meminit grow_zone_span(struct zone *zone, unsigned long start_pfn,
zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
zone->zone_start_pfn;
+ if (!zone->zone_dyn_start_pfn || start_pfn < zone->zone_dyn_start_pfn)
+ zone->zone_dyn_start_pfn = start_pfn;
zone_span_writeunlock(zone);
}
@@ -601,8 +603,9 @@ static int find_biggest_section_pfn(int nid, struct zone *zone,
static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
unsigned long end_pfn)
{
- unsigned long zone_start_pfn = zone->zone_start_pfn;
+ unsigned long zone_start_pfn = zone->zone_dyn_start_pfn;
unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */
+ bool dyn_zone = zone->zone_start_pfn == zone_start_pfn;
unsigned long zone_end_pfn = z;
unsigned long pfn;
struct mem_section *ms;
@@ -619,7 +622,9 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
pfn = find_smallest_section_pfn(nid, zone, end_pfn,
zone_end_pfn);
if (pfn) {
- zone->zone_start_pfn = pfn;
+ if (dyn_zone)
+ zone->zone_start_pfn = pfn;
+ zone->zone_dyn_start_pfn = pfn;
zone->spanned_pages = zone_end_pfn - pfn;
}
} else if (zone_end_pfn == end_pfn) {
@@ -661,8 +666,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
}
/* The zone has no valid section */
- zone->zone_start_pfn = 0;
- zone->spanned_pages = 0;
+ if (dyn_zone)
+ zone->zone_start_pfn = 0;
+ zone->zone_dyn_start_pfn = 0;
+ zone->spanned_pages = zone->init_spanned_pages;
zone_span_writeunlock(zone);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 63358d9f9aa9..2d8b1d602ff3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -209,6 +209,10 @@ EXPORT_SYMBOL(totalram_pages);
static char * const zone_names[MAX_NR_ZONES] = {
#ifdef CONFIG_ZONE_DMA
"DMA",
+#else
+#ifdef CONFIG_ZONE_DEVICE
+ "Device",
+#endif
#endif
#ifdef CONFIG_ZONE_DMA32
"DMA32",
@@ -218,9 +222,6 @@ static char * const zone_names[MAX_NR_ZONES] = {
"HighMem",
#endif
"Movable",
-#ifdef CONFIG_ZONE_DEVICE
- "Device",
-#endif
};
compound_page_dtor * const compound_page_dtors[] = {
@@ -5082,6 +5083,8 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
node_start_pfn, node_end_pfn,
zholes_size);
zone->spanned_pages = size;
+ zone->init_spanned_pages = size;
+ zone->zone_dyn_start_pfn = 0;
zone->present_pages = real_size;
totalpages += size;
4 years, 11 months
[PATCH 17/17] ACPI/EINJ: Allow memory error injection to NVDIMM
by Borislav Petkov
From: Toshi Kani <toshi.kani(a)hpe.com>
In the case of memory error injection, einj_error_inject() checks if
a target address is System RAM. Change this check to allow injecting
a memory error into NVDIMM memory by calling region_intersects() with
IORES_DESC_PERSISTENT_MEMORY. This enables memory error testing on both
System RAM and NVDIMM.
In addition, page_is_ram() is replaced with region_intersects() with
IORESOURCE_SYSTEM_RAM, so that it can verify a target address range with
the requested size.
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Acked-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jarkko Nikula <jarkko.nikula(a)linux.intel.com>
Cc: Len Brown <lenb(a)kernel.org>
Cc: linux-acpi(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-mm <linux-mm(a)kvack.org>
Cc: linux-nvdimm(a)lists.01.org
Cc: "Rafael J. Wysocki" <rjw(a)rjwysocki.net>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Link: http://lkml.kernel.org/r/1452020081-26534-17-git-send-email-toshi.kani@hp...
Signed-off-by: Borislav Petkov <bp(a)suse.de>
---
drivers/acpi/apei/einj.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 0431883653be..559c1173de1c 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -519,7 +519,7 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
u64 param3, u64 param4)
{
int rc;
- unsigned long pfn;
+ u64 base_addr, size;
/* If user manually set "flags", make sure it is legal */
if (flags && (flags &
@@ -545,10 +545,17 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
/*
* Disallow crazy address masks that give BIOS leeway to pick
* injection address almost anywhere. Insist on page or
- * better granularity and that target address is normal RAM.
+ * better granularity and that target address is normal RAM or
+ * NVDIMM.
*/
- pfn = PFN_DOWN(param1 & param2);
- if (!page_is_ram(pfn) || ((param2 & PAGE_MASK) != PAGE_MASK))
+ base_addr = param1 & param2;
+ size = ~param2 + 1;
+
+ if (((param2 & PAGE_MASK) != PAGE_MASK) ||
+ ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
+ != REGION_INTERSECTS) &&
+ (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
+ != REGION_INTERSECTS)))
return -EINVAL;
inject:
--
2.3.5
4 years, 11 months
[PATCH 14/17] x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
by Borislav Petkov
From: Toshi Kani <toshi.kani(a)hpe.com>
Change the callers of walk_iomem_res() scanning for the following
resources by name to use walk_iomem_res_desc() instead.
"ACPI Tables"
"ACPI Non-volatile Storage"
"Persistent Memory (legacy)"
"Crash kernel"
Note, the caller of walk_iomem_res() with "GART" will be removed in a
later patch.
Reviewed-by: Dave Young <dyoung(a)redhat.com>
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Don Zickus <dzickus(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: kexec(a)lists.infradead.org
Cc: "Lee, Chun-Yi" <joeyli.kernel(a)gmail.com>
Cc: linux-arch(a)vger.kernel.org
Cc: linux-mm <linux-mm(a)kvack.org>
Cc: linux-nvdimm(a)lists.01.org
Cc: Minfei Huang <mnfhuang(a)gmail.com>
Cc: "Peter Zijlstra (Intel)" <peterz(a)infradead.org>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Stephen Rothwell <sfr(a)canb.auug.org.au>
Cc: Takao Indoh <indou.takao(a)jp.fujitsu.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: x86-ml <x86(a)kernel.org>
Link: http://lkml.kernel.org/r/1452020081-26534-14-git-send-email-toshi.kani@hp...
Signed-off-by: Borislav Petkov <bp(a)suse.de>
---
arch/x86/kernel/crash.c | 4 ++--
arch/x86/kernel/pmem.c | 4 ++--
drivers/nvdimm/e820.c | 2 +-
kernel/kexec_file.c | 8 ++++----
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 58f34319b29a..35e152eeb6e0 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -599,12 +599,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
/* Add ACPI tables */
cmd.type = E820_ACPI;
flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- walk_iomem_res("ACPI Tables", flags, 0, -1, &cmd,
+ walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1, &cmd,
memmap_entry_callback);
/* Add ACPI Non-volatile Storage */
cmd.type = E820_NVS;
- walk_iomem_res("ACPI Non-volatile Storage", flags, 0, -1, &cmd,
+ walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
memmap_entry_callback);
/* Add crashk_low_res region */
diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index 14415aff1813..92f70147a9a6 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -13,11 +13,11 @@ static int found(u64 start, u64 end, void *data)
static __init int register_e820_pmem(void)
{
- char *pmem = "Persistent Memory (legacy)";
struct platform_device *pdev;
int rc;
- rc = walk_iomem_res(pmem, IORESOURCE_MEM, 0, -1, NULL, found);
+ rc = walk_iomem_res_desc(IORES_DESC_PERSISTENT_MEMORY_LEGACY,
+ IORESOURCE_MEM, 0, -1, NULL, found);
if (rc <= 0)
return 0;
diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c
index b0045a505dc8..95825b38559a 100644
--- a/drivers/nvdimm/e820.c
+++ b/drivers/nvdimm/e820.c
@@ -55,7 +55,7 @@ static int e820_pmem_probe(struct platform_device *pdev)
for (p = iomem_resource.child; p ; p = p->sibling) {
struct nd_region_desc ndr_desc;
- if (strncmp(p->name, "Persistent Memory (legacy)", 26) != 0)
+ if (p->desc != IORES_DESC_PERSISTENT_MEMORY_LEGACY)
continue;
memset(&ndr_desc, 0, sizeof(ndr_desc));
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 2bfcdc064116..56b18eb1f001 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -524,10 +524,10 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz,
/* Walk the RAM ranges and allocate a suitable range for the buffer */
if (image->type == KEXEC_TYPE_CRASH)
- ret = walk_iomem_res("Crash kernel",
- IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
- crashk_res.start, crashk_res.end, kbuf,
- locate_mem_hole_callback);
+ ret = walk_iomem_res_desc(crashk_res.desc,
+ IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
+ crashk_res.start, crashk_res.end, kbuf,
+ locate_mem_hole_callback);
else
ret = walk_system_ram_res(0, -1, kbuf,
locate_mem_hole_callback);
--
2.3.5
4 years, 11 months
[RFC PATCH] dax, ext2, ext4, XFS: fix data corruption race
by Ross Zwisler
With the current DAX code the following race exists:
Process 1 Process 2
--------- ---------
__dax_fault() - read file f, index 0
get_block() -> returns hole
__dax_fault() - write file f, index 0
get_block() -> allocates blocks
dax_insert_mapping()
dax_load_hole()
*data corruption*
An analogous race exists between __dax_fault() loading a hole and
__dax_pmd_fault() allocating a PMD DAX page and trying to insert it, and
that race also ends in data corruption.
One solution to this race was proposed by Jan Kara:
So we need some exclusion that makes sure pgoff->block mapping
information is uptodate at the moment we insert it into page tables. The
simplest reasonably fast thing I can see is:
When handling a read fault, things stay as is and filesystem protects the
fault with an equivalent of EXT4_I(inode)->i_mmap_sem held for reading.
When handling a write fault we first grab EXT4_I(inode)->i_mmap_sem for
reading and try a read fault. If __dax_fault() sees a hole returned from
get_blocks() during a write fault, it bails out. Filesystem grabs
EXT4_I(inode)->i_mmap_sem for writing and retries with different
get_blocks() callback which will allocate blocks. That way we get proper
exclusion for faults needing to allocate blocks.
This patch adds this logic to DAX, ext2, ext4 and XFS. The changes for
these four components all appear in the same patch as opposed to being
spread out among multiple patches in a series because we are changing the
argument list to __dax_fault(), __dax_pmd_fault() and __dax_mkwrite().
This means we can't easily change things one component at a time and still
keep the series bisectable.
For ext4 this patch assumes that the journal entry is only needed when we
are actually allocating blocks with get_block(). An in-depth review of
this logic would be welcome.
I also fixed a bug in the ext4 implementation of ext4_dax_mkwrite().
Previously it assumed that the block allocation was already complete, so it
didn't create a journal entry. For a read that creates a zero page to
cover a hole followed by a write that actually allocates storage, this is
incorrect. The ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path
would call get_blocks() to allocate storage, so I'm pretty sure we need a
journal entry here.
With that fixed, I noticed that for both ext2 and ext4 the paths through
the .fault and .page_mkwrite vmops paths were exactly the same, so I
removed ext4_dax_mkwrite() and ext2_dax_mkwrite() and just use
ext4_dax_fault() and ext2_dax_fault() directly instead.
I'm still in the process of testing this patch, which is part of the reason
why it is marked as RFC. I know of at least one deadlock that is easily
hit by doing a read of a hole followed by a write which allocates storage.
If you're using xfstests you can hit this easily with generic/075 with any
of the three filesytems. I'll continue to track this down, but I wanted to
send out this RFC to sanity check the general approach.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Suggested-by: Jan Kara <jack(a)suse.cz>
---
fs/block_dev.c | 19 ++++++++++--
fs/dax.c | 20 ++++++++++---
fs/ext2/file.c | 41 ++++++++++++-------------
fs/ext4/file.c | 86 +++++++++++++++++++++++++----------------------------
fs/xfs/xfs_file.c | 28 +++++++++++++----
include/linux/dax.h | 8 +++--
6 files changed, 121 insertions(+), 81 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 303b7cd..775f1b0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1733,13 +1733,28 @@ static const struct address_space_operations def_blk_aops = {
*/
static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
- return __dax_fault(vma, vmf, blkdev_get_block, NULL);
+ int ret;
+
+ ret = __dax_fault(vma, vmf, blkdev_get_block, NULL, false);
+
+ if (WARN_ON_ONCE(ret == -EAGAIN))
+ ret = VM_FAULT_SIGBUS;
+
+ return ret;
}
static int blkdev_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, unsigned int flags)
{
- return __dax_pmd_fault(vma, addr, pmd, flags, blkdev_get_block, NULL);
+ int ret;
+
+ ret = __dax_pmd_fault(vma, addr, pmd, flags, blkdev_get_block, NULL,
+ false);
+
+ if (WARN_ON_ONCE(ret == -EAGAIN))
+ ret = VM_FAULT_SIGBUS;
+
+ return ret;
}
static void blkdev_vm_open(struct vm_area_struct *vma)
diff --git a/fs/dax.c b/fs/dax.c
index 206650f..7a927eb 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -582,13 +582,19 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
* extent mappings from @get_block, but it is optional for reads as
* dax_insert_mapping() will always zero unwritten blocks. If the fs does
* not support unwritten extents, the it should pass NULL.
+ * @alloc_ok: True if our caller is holding a lock that isolates us from other
+ * DAX faults on the same inode. This allows us to allocate new storage
+ * with get_block() and not have to worry about races with other fault
+ * handlers. If this is unset and we need to allocate storage we will
+ * return -EAGAIN to ask our caller to retry with the proper locking.
*
* When a page fault occurs, filesystems may call this helper in their
* fault handler for DAX files. __dax_fault() assumes the caller has done all
* the necessary locking for the page fault to proceed successfully.
*/
int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
- get_block_t get_block, dax_iodone_t complete_unwritten)
+ get_block_t get_block, dax_iodone_t complete_unwritten,
+ bool alloc_ok)
{
struct file *file = vma->vm_file;
struct address_space *mapping = file->f_mapping;
@@ -642,6 +648,9 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
if (!buffer_mapped(&bh) && !buffer_unwritten(&bh) && !vmf->cow_page) {
if (vmf->flags & FAULT_FLAG_WRITE) {
+ if (!alloc_ok)
+ return -EAGAIN;
+
error = get_block(inode, block, &bh, 1);
count_vm_event(PGMAJFAULT);
mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
@@ -745,7 +754,7 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
sb_start_pagefault(sb);
file_update_time(vma->vm_file);
}
- result = __dax_fault(vma, vmf, get_block, complete_unwritten);
+ result = __dax_fault(vma, vmf, get_block, complete_unwritten, false);
if (vmf->flags & FAULT_FLAG_WRITE)
sb_end_pagefault(sb);
@@ -780,7 +789,7 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address,
int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, unsigned int flags, get_block_t get_block,
- dax_iodone_t complete_unwritten)
+ dax_iodone_t complete_unwritten, bool alloc_ok)
{
struct file *file = vma->vm_file;
struct address_space *mapping = file->f_mapping;
@@ -836,6 +845,9 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
return VM_FAULT_SIGBUS;
if (!buffer_mapped(&bh) && write) {
+ if (!alloc_ok)
+ return -EAGAIN;
+
if (get_block(inode, block, &bh, 1) != 0)
return VM_FAULT_SIGBUS;
alloc = true;
@@ -1017,7 +1029,7 @@ int dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
file_update_time(vma->vm_file);
}
result = __dax_pmd_fault(vma, address, pmd, flags, get_block,
- complete_unwritten);
+ complete_unwritten, false);
if (flags & FAULT_FLAG_WRITE)
sb_end_pagefault(sb);
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 2c88d68..1106a9e 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -49,11 +49,17 @@ static int ext2_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
sb_start_pagefault(inode->i_sb);
file_update_time(vma->vm_file);
}
+
down_read(&ei->dax_sem);
+ ret = __dax_fault(vma, vmf, ext2_get_block, NULL, false);
+ up_read(&ei->dax_sem);
- ret = __dax_fault(vma, vmf, ext2_get_block, NULL);
+ if (ret == -EAGAIN) {
+ down_write(&ei->dax_sem);
+ ret = __dax_fault(vma, vmf, ext2_get_block, NULL, true);
+ up_write(&ei->dax_sem);
+ }
- up_read(&ei->dax_sem);
if (vmf->flags & FAULT_FLAG_WRITE)
sb_end_pagefault(inode->i_sb);
return ret;
@@ -70,33 +76,24 @@ static int ext2_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
sb_start_pagefault(inode->i_sb);
file_update_time(vma->vm_file);
}
+
down_read(&ei->dax_sem);
+ ret = __dax_pmd_fault(vma, addr, pmd, flags, ext2_get_block, NULL,
+ false);
+ up_read(&ei->dax_sem);
- ret = __dax_pmd_fault(vma, addr, pmd, flags, ext2_get_block, NULL);
+ if (ret == -EAGAIN) {
+ down_write(&ei->dax_sem);
+ ret = __dax_pmd_fault(vma, addr, pmd, flags, ext2_get_block,
+ NULL, true);
+ up_write(&ei->dax_sem);
+ }
- up_read(&ei->dax_sem);
if (flags & FAULT_FLAG_WRITE)
sb_end_pagefault(inode->i_sb);
return ret;
}
-static int ext2_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
-{
- struct inode *inode = file_inode(vma->vm_file);
- struct ext2_inode_info *ei = EXT2_I(inode);
- int ret;
-
- sb_start_pagefault(inode->i_sb);
- file_update_time(vma->vm_file);
- down_read(&ei->dax_sem);
-
- ret = __dax_mkwrite(vma, vmf, ext2_get_block, NULL);
-
- up_read(&ei->dax_sem);
- sb_end_pagefault(inode->i_sb);
- return ret;
-}
-
static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
struct vm_fault *vmf)
{
@@ -124,7 +121,7 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
static const struct vm_operations_struct ext2_dax_vm_ops = {
.fault = ext2_dax_fault,
.pmd_fault = ext2_dax_pmd_fault,
- .page_mkwrite = ext2_dax_mkwrite,
+ .page_mkwrite = ext2_dax_fault,
.pfn_mkwrite = ext2_dax_pfn_mkwrite,
};
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index fa899c9..abddc8a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -204,24 +204,30 @@ static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
if (write) {
sb_start_pagefault(sb);
file_update_time(vma->vm_file);
- down_read(&EXT4_I(inode)->i_mmap_sem);
- handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
- EXT4_DATA_TRANS_BLOCKS(sb));
- } else
- down_read(&EXT4_I(inode)->i_mmap_sem);
+ }
- if (IS_ERR(handle))
- result = VM_FAULT_SIGBUS;
- else
- result = __dax_fault(vma, vmf, ext4_dax_mmap_get_block, NULL);
+ down_read(&EXT4_I(inode)->i_mmap_sem);
+ result = __dax_fault(vma, vmf, ext4_dax_mmap_get_block, NULL,
+ false);
+ up_read(&EXT4_I(inode)->i_mmap_sem);
- if (write) {
- if (!IS_ERR(handle))
+ if (result == -EAGAIN) {
+ down_write(&EXT4_I(inode)->i_mmap_sem);
+ handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
+ EXT4_DATA_TRANS_BLOCKS(sb));
+
+ if (IS_ERR(handle))
+ result = VM_FAULT_SIGBUS;
+ else {
+ result = __dax_fault(vma, vmf,
+ ext4_dax_mmap_get_block, NULL, true);
ext4_journal_stop(handle);
- up_read(&EXT4_I(inode)->i_mmap_sem);
+ }
+ up_write(&EXT4_I(inode)->i_mmap_sem);
+ }
+
+ if (write)
sb_end_pagefault(sb);
- } else
- up_read(&EXT4_I(inode)->i_mmap_sem);
return result;
}
@@ -238,47 +244,37 @@ static int ext4_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
if (write) {
sb_start_pagefault(sb);
file_update_time(vma->vm_file);
- down_read(&EXT4_I(inode)->i_mmap_sem);
+ }
+
+ down_read(&EXT4_I(inode)->i_mmap_sem);
+ result = __dax_pmd_fault(vma, addr, pmd, flags,
+ ext4_dax_mmap_get_block, NULL, false);
+ up_read(&EXT4_I(inode)->i_mmap_sem);
+
+ if (result == -EAGAIN) {
+ down_write(&EXT4_I(inode)->i_mmap_sem);
handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
ext4_chunk_trans_blocks(inode,
PMD_SIZE / PAGE_SIZE));
- } else
- down_read(&EXT4_I(inode)->i_mmap_sem);
- if (IS_ERR(handle))
- result = VM_FAULT_SIGBUS;
- else
- result = __dax_pmd_fault(vma, addr, pmd, flags,
- ext4_dax_mmap_get_block, NULL);
-
- if (write) {
- if (!IS_ERR(handle))
+ if (IS_ERR(handle))
+ result = VM_FAULT_SIGBUS;
+ else {
+ result = __dax_pmd_fault(vma, addr, pmd, flags,
+ ext4_dax_mmap_get_block, NULL, true);
ext4_journal_stop(handle);
- up_read(&EXT4_I(inode)->i_mmap_sem);
+ }
+ up_write(&EXT4_I(inode)->i_mmap_sem);
+ }
+
+ if (write)
sb_end_pagefault(sb);
- } else
- up_read(&EXT4_I(inode)->i_mmap_sem);
return result;
}
-static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
-{
- int err;
- struct inode *inode = file_inode(vma->vm_file);
-
- sb_start_pagefault(inode->i_sb);
- file_update_time(vma->vm_file);
- down_read(&EXT4_I(inode)->i_mmap_sem);
- err = __dax_mkwrite(vma, vmf, ext4_dax_mmap_get_block, NULL);
- up_read(&EXT4_I(inode)->i_mmap_sem);
- sb_end_pagefault(inode->i_sb);
-
- return err;
-}
-
/*
- * Handle write fault for VM_MIXEDMAP mappings. Similarly to ext4_dax_mkwrite()
+ * Handle write fault for VM_MIXEDMAP mappings. Similarly to ext4_dax_fault()
* handler we check for races agaist truncate. Note that since we cycle through
* i_mmap_sem, we are sure that also any hole punching that began before we
* were called is finished by now and so if it included part of the file we
@@ -311,7 +307,7 @@ static int ext4_dax_pfn_mkwrite(struct vm_area_struct *vma,
static const struct vm_operations_struct ext4_dax_vm_ops = {
.fault = ext4_dax_fault,
.pmd_fault = ext4_dax_pmd_fault,
- .page_mkwrite = ext4_dax_mkwrite,
+ .page_mkwrite = ext4_dax_fault,
.pfn_mkwrite = ext4_dax_pfn_mkwrite,
};
#else
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 55e16e2..81edbd4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1523,16 +1523,26 @@ xfs_filemap_page_mkwrite(
sb_start_pagefault(inode->i_sb);
file_update_time(vma->vm_file);
- xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
if (IS_DAX(inode)) {
- ret = __dax_mkwrite(vma, vmf, xfs_get_blocks_dax_fault, NULL);
+ xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+ ret = __dax_mkwrite(vma, vmf, xfs_get_blocks_dax_fault, NULL,
+ false);
+ xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+
+ if (ret == -EAGAIN) {
+ xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_EXCL);
+ ret = __dax_mkwrite(vma, vmf,
+ xfs_get_blocks_dax_fault, NULL, true);
+ xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_EXCL);
+ }
} else {
+ xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
ret = block_page_mkwrite(vma, vmf, xfs_get_blocks);
ret = block_page_mkwrite_return(ret);
+ xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
}
- xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
sb_end_pagefault(inode->i_sb);
return ret;
@@ -1560,7 +1570,8 @@ xfs_filemap_fault(
* changes to xfs_get_blocks_direct() to map unwritten extent
* ioend for conversion on read-only mappings.
*/
- ret = __dax_fault(vma, vmf, xfs_get_blocks_dax_fault, NULL);
+ ret = __dax_fault(vma, vmf, xfs_get_blocks_dax_fault, NULL,
+ false);
} else
ret = filemap_fault(vma, vmf);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
@@ -1598,9 +1609,16 @@ xfs_filemap_pmd_fault(
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
ret = __dax_pmd_fault(vma, addr, pmd, flags, xfs_get_blocks_dax_fault,
- NULL);
+ NULL, false);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+ if (ret == -EAGAIN) {
+ xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_EXCL);
+ ret = __dax_pmd_fault(vma, addr, pmd, flags,
+ xfs_get_blocks_dax_fault, NULL, true);
+ xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_EXCL);
+ }
+
if (flags & FAULT_FLAG_WRITE)
sb_end_pagefault(inode->i_sb);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8204c3d..783a2b6 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -13,12 +13,13 @@ int dax_truncate_page(struct inode *, loff_t from, get_block_t);
int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
dax_iodone_t);
int __dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
- dax_iodone_t);
+ dax_iodone_t, bool alloc_ok);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
int dax_pmd_fault(struct vm_area_struct *, unsigned long addr, pmd_t *,
unsigned int flags, get_block_t, dax_iodone_t);
int __dax_pmd_fault(struct vm_area_struct *, unsigned long addr, pmd_t *,
- unsigned int flags, get_block_t, dax_iodone_t);
+ unsigned int flags, get_block_t, dax_iodone_t,
+ bool alloc_ok);
#else
static inline int dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, unsigned int flags, get_block_t gb,
@@ -30,7 +31,8 @@ static inline int dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
#endif
int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *);
#define dax_mkwrite(vma, vmf, gb, iod) dax_fault(vma, vmf, gb, iod)
-#define __dax_mkwrite(vma, vmf, gb, iod) __dax_fault(vma, vmf, gb, iod)
+#define __dax_mkwrite(vma, vmf, gb, iod, alloc) \
+ __dax_fault(vma, vmf, gb, iod, alloc)
static inline bool vma_is_dax(struct vm_area_struct *vma)
{
--
2.5.0
4 years, 11 months
RE: O'Reilly - Solid | Marketing Database
by Jesu Will
Hi,
I hope this note finds you well.
I just wanted to follow up on my previous email to check if you had any
chance to look at it and have any updates for me.
Please feel free to contact me with your target specifications so that I can
get back to you with additional information for your review.
Thank you and we look forward to hearing from you soon.
Regards,
Jesu Will
From: Jesu Will [mailto:jesu.will@techdatalist.com]
Sent: Wednesday, January 20, 2016 12:58 AM
To: 'linux-nvdimm(a)lists.01.org'
Subject: O'Reilly - Solid | Marketing Database
Hi,
I hope this note finds you well.
I had a chance to review your website and I do see that your company has
attended at an event "O'Reilly - Solid ".
I am writing to check if you would be interested in acquiring a targeted
database of the below mentioned professionals for your upcoming marketing
campaigns or sales growth.
Professionals:
. software and hardware-engineers, researchers, roboticists,
artists, developers, designers, founders of startups, and innovators.
Please let me know your thoughts.
Thank you and I look forward to hear from you soon.
Regards,
Jesu Will | Marketing Specialist | Techadatalist
If you would like to unsubscribe, reply back with "Leaveout" in the subject
line.
4 years, 11 months
[PATCH v3 00/17] Enhance iomem search interfaces and support EINJ to NVDIMM
by Toshi Kani
This patch-set enhances the iomem table and its search interfacs, and
then changes EINJ to support NVDIMM.
- Patches 1-2 add a new System RAM type, IORESOURCE_SYSTEM_RAM, and
make the iomem search interfaces work with resource flags with
modifier bits set. IORESOURCE_SYSTEM_RAM has IORESOURCE_MEM bit set
for backward compatibility.
- Patch 3 adds a new field, I/O resource descriptor, into struct resource.
Drivers can assign their unique descritor to a range when they support
the iomem search interfaces.
- Patches 4-9 changes initializations of resource entries. They set
the System RAM type to System RAM ranges, set I/O resource descriptors
to the regions targeted by the iomem search interfaces, and change
to call kzalloc() where kmalloc() is used to allocate struct resource
ranges.
- Patches 10-14 extend the iomem interfaces to check System RAM ranges
with the System RAM type and the I/O resource descriptor.
- Patch 15-16 remove deprecated walk_iomem_res().
- Patch 17 changes the EINJ driver to allow injecting a memory error
to NVDIMM.
---
v3:
- Remove the walk_iomem_res() call with "GART" in crash.c since it is
no longer needed. Then kill walk_iomem_res(). (Borislav Petkov,
Dave Young)
- Change to use crashk_res.desc at the walk_iomem_res_desc() call in
kexec_add_buffer(). (Minfei Huang)
v2:
- Add 'desc' to struct resource, and add a new iomem interface to
search with the desc. (Borislav Petkov)
- Add a check to checkpatch.pl to warn on new use of walk_iomem_res().
(Borislav Petkov)
v1:
- Searching for System RAM in the resource table should not require
strcmp(). (Borislav Petkov)
- Add a new System RAM type as a modifier to IORESOURCE_MEM.
(Linus Torvalds)
- NVDIMM check needs to be able to distinguish legacy and NFIT pmem
ranges. (Dan Williams)
---
Toshi Kani (17):
01/17 resource: Add System RAM resource type
02/17 resource: make resource flags handled properly
03/17 resource: Add I/O resource descriptor
04/17 x86/e820: Set System RAM type and descriptor
05/17 ia64: Set System RAM type and descriptor
06/17 arch: Set IORESOURCE_SYSTEM_RAM to System RAM
07/17 kexec: Set IORESOURCE_SYSTEM_RAM to System RAM
08/17 xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
09/17 drivers: Initialize resource entry to zero
10/17 resource: Change walk_system_ram to use System RAM type
11/17 arm/samsung: Change s3c_pm_run_res() to use System RAM type
12/17 memremap: Change region_intersects() to take @flags and @desc
13/17 resource: Add walk_iomem_res_desc()
14/17 x86,nvdimm,kexec: Use walk_iomem_res_desc() for iomem search
15/17 x86/kexec: Remove walk_iomem_res() call with GART
16/17 resource: Kill walk_iomem_res()
17/17 ACPI/EINJ: Allow memory error injection to NVDIMM
---
arch/arm/kernel/setup.c | 6 +--
arch/arm/plat-samsung/pm-check.c | 4 +-
arch/arm64/kernel/setup.c | 6 +--
arch/avr32/kernel/setup.c | 6 +--
arch/ia64/kernel/efi.c | 13 ++++--
arch/ia64/kernel/setup.c | 6 +--
arch/m32r/kernel/setup.c | 4 +-
arch/mips/kernel/setup.c | 10 +++--
arch/parisc/mm/init.c | 6 +--
arch/powerpc/mm/mem.c | 2 +-
arch/s390/kernel/setup.c | 8 ++--
arch/score/kernel/setup.c | 2 +-
arch/sh/kernel/setup.c | 8 ++--
arch/sparc/mm/init_64.c | 8 ++--
arch/tile/kernel/setup.c | 11 +++--
arch/unicore32/kernel/setup.c | 6 +--
arch/x86/kernel/crash.c | 41 ++-----------------
arch/x86/kernel/e820.c | 38 ++++++++++++++++-
arch/x86/kernel/pmem.c | 4 +-
arch/x86/kernel/setup.c | 6 +--
drivers/acpi/acpi_platform.c | 2 +-
drivers/acpi/apei/einj.c | 15 +++++--
drivers/nvdimm/e820.c | 2 +-
drivers/parisc/eisa_enumerator.c | 4 +-
drivers/rapidio/rio.c | 8 ++--
drivers/sh/superhyway/superhyway.c | 2 +-
drivers/xen/balloon.c | 2 +-
include/linux/ioport.h | 33 ++++++++++++++-
include/linux/mm.h | 3 +-
kernel/kexec_core.c | 8 ++--
kernel/kexec_file.c | 8 ++--
kernel/memremap.c | 13 +++---
kernel/resource.c | 83 ++++++++++++++++++++++----------------
mm/memory_hotplug.c | 2 +-
34 files changed, 225 insertions(+), 155 deletions(-)
4 years, 11 months