On Thu, Sep 27, 2018 at 03:13:29PM +0200, Michal Hocko wrote:
I would have to double check but is the hotplug lock really
access to the state initialized by init_currently_empty_zone? E.g.
zone_start_pfn is a nice example of a state that is used outside of the
lock. zone's free lists are similar. So do we really need the hoptlug
lock? And more broadly, what does the hotplug lock is supposed to
serialize in general. A proper documentation would surely help to answer
these questions. There is way too much of "do not touch this code and
just make my particular hack" mindset which made the whole memory
hotplug a giant pile of mess. We really should start with some proper
engineering here finally.
* Locking rules:
* zone_start_pfn and spanned_pages are protected by span_seqlock.
* It is a seqlock because it has to be read outside of zone->lock,
* and it is done in the main allocator path. But, it is written
* quite infrequently.
* Write access to present_pages at runtime should be protected by
* mem_hotplug_begin/end(). Any reader who can't tolerant drift of
* present_pages should get_online_mems() to get a stable value.
IIUC, looks like zone_start_pfn should be envolved with
zone_span_writelock/zone_span_writeunlock, and since zone_start_pfn is changed
in init_currently_empty_zone, I guess that the whole function should be within
So, a blind shot, but could we do something like the following?
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 898e1f816821..49f87252f1b1 100644
@@ -764,14 +764,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long
int nid = pgdat->node_id;
unsigned long flags;
- if (zone_is_empty(zone))
- init_currently_empty_zone(zone, start_pfn, nr_pages);
/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
+ if (zone_is_empty(zone))
+ init_currently_empty_zone(zone, start_pfn, nr_pages);
resize_zone_range(zone, start_pfn, nr_pages);
resize_pgdat_range(pgdat, start_pfn, nr_pages);
Then, we could take move_pfn_range_to_zone out of the hotplug lock.
Although I am not sure about leaving memmap_init_zone unprotected.
For the normal memory, that is not a problem since the memblock's lock
protects us from touching the same pages at the same time in online/offline_pages,
but for HMM/devm the story is different.
I am totally unaware of HMM/devm, so I am not sure if its protected somehow.
e.g: what happens if devm_memremap_pages and devm_memremap_pages_release are running
at the same time for the same memory-range (with the assumption that the hotplug-lock
does not protect move_pfn_range_to_zone anymore).