On Tue, 12 Jan 2021, Zhongwei Cai wrote:
> I'm working with Mingkai on optimizations for Ext4-dax.
What specific patch are you working on? Please, post it somewhere.
> We think that optmizing the read-iter method cannot achieve the
> same performance as the read method for Ext4-dax.
> We tried Mikulas's benchmark on Ext4-dax. The overall time and perf
> results are listed below:
> Overall time of 2^26 4KB read.
> Method Time
> read 26.782s
> read-iter 36.477s
What happens if you use this trick ( https://lkml.org/lkml/2021/1/11/1612 )
- detect in the "read_iter" method that there is just one segment and
treat it like a "read" method. I think that it should improve performance
for your case.
Here's a collection of test updates. It adds support for regression
testing pfn_to_online_page() which suffered from a lack of precision in
mixed zone memory-sections. Updates the mremap() regression to accept
failure as an option (the behavior in v5.11-rc1+). Fixes a warning, and
ditches an 'out' label.
Dan Williams (4):
ndctl/test: Fix btt expect table compile warning
ndctl/test: Cleanup unnecessary out label
ndctl/test: Fix device-dax mremap() test
ndctl/test: Exercise soft_offline_page() corner cases
test/dax-pmd.c | 17 +++++++++--------
test/dax-poison.c | 19 +++++++++++++++++++
test/device-dax.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
test/libndctl.c | 12 ++++++------
4 files changed, 79 insertions(+), 14 deletions(-)
Changes since v1 :
- Clarify the failing condition in patch 3 (Michal)
- Clarify how subsection collisions manifest in shipping systems
- Use zone_idx() (Michal)
- Move section_taint_zone_device() conditions to
- Fix pfn_to_online_page() to account for pfn_valid() vs
pfn_section_valid() confusion (David)
Michal reminds that the discussion about how to ensure pfn-walkers do
not get confused by ZONE_DEVICE pages never resolved. A pfn-walker that
uses pfn_to_online_page() may inadvertently translate a pfn as online
and in the page allocator, when it is offline managed by a ZONE_DEVICE
mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about
ZONE_DEVICE section collisions")).
The 2 proposals under consideration are teach pfn_to_online_page() to be
precise in the presence of mixed-zone sections, or teach the memory-add
code to drop the System RAM associated with ZONE_DEVICE collisions. In
order to not regress memory capacity by a few 10s to 100s of MiB the
approach taken in this set is to add precision to pfn_to_online_page().
In the course of validating pfn_to_online_page() a couple other fixes
1/ soft_offline_page() fails to drop the reference taken in the
madvise(..., MADV_SOFT_OFFLINE) case.
2/ The libnvdimm sysfs attribute visibility code was failing to publish
the resource base for memmap=ss!nn defined namespaces. This is needed
for the regression test for soft_offline_page().
Dan Williams (5):
mm: Move pfn_to_online_page() out of line
mm: Teach pfn_to_online_page() to consider subsection validity
mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
mm: Fix page reference leak in soft_offline_page()
libnvdimm/namespace: Fix visibility of namespace resource attribute
drivers/nvdimm/namespace_devs.c | 10 +++---
include/linux/memory_hotplug.h | 17 +----------
include/linux/mmzone.h | 22 +++++++++-----
mm/memory-failure.c | 20 ++++++++++---
mm/memory_hotplug.c | 62 +++++++++++++++++++++++++++++++++++++++
5 files changed, 99 insertions(+), 32 deletions(-)
From: Zhongwei Cai
> Sent: 12 January 2021 13:45
> The overhead mainly consists of two parts. The first is constructing
> struct iov_iter and iterating it (i.e., new_sync, _copy_mc_to_iter and
> iov_iter_init). The second is the dax io mechanism provided by VFS (i.e.,
> dax_iomap_rw, iomap_apply and ext4_iomap_begin).
Setting up an iov_iter with a single buffer ought to be relatively
cheap - compared to a file system read.
The iteration should be over the total length
calling copy_from/to_iter() for 'chunks' that don't
depend on the size of the iov fragments.
So copy_to/from_iter() should directly replace the copy_to/from_user()
calls in the 'read' method.
For a single buffer this really ought to be noise as well.
Clearly is the iov has a lot of short fragments the copy
will be more expensive.
Access to /dev/null and /dev/zero are much more likely to show
the additional costs of the iov_iter code than fs code.
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)