On Thu, Jul 27, 2017 at 03:12:38PM +0200, Jan Kara wrote:
So the functionality this patches implement: We have an inode flag
I abuse S_SYNC inode flag for this and IMHO it kind of makes sense but if
people hate that I'm certainly open to using new flag in the final
implementation) that marks inode as requiring synchronous page faults.
The guarantee provided by this flag on inode is: While a block is writeably
mapped into page tables, it is guaranteed to be visible in the file at that
offset also after a crash.
I think the right interface for page fault behavior is a mmap
flag, MAP_SYNC or similar, which will be optional and a failure of
a MAP_SYNC mmap will indicated that this behavior can't be provided
for the given file descriptor.
>From my (fairly limited) knowledge of XFS it seems XFS should be
able to do the
same and it should be even possible for filesystem to implement safe remapping
of a file offset to a different block (i.e. break reflink, do defrag, or
similar stuff) like:
It should. But what I'm worried about for both ext4 and XFS is the
worst case behavior that the page faul path can now hit, e.g. flushing
a potentially full log. Do you have any numbers of how long your
ext4 page faults take with this in the worst case?
There are couple of open questions with this implementation:
1) Is it worth the hassle?
For that I'd really like to see performance numbers. And compared to
the immutable nightmare that Dan proposed this looks orders of magnitude
2) Is S_SYNC good flag to use or should we use a new inode flag?
I think the right interface is mmap as said above. But even if not
we should not simply reuse existing flags with a well defined (although
not particular useful) behavior.
3) VM_FAULT_RO and especially passing of resulting 'pfn'
dax_iomap_fault() through filesystem fault handler to dax_pfn_mkwrite() in
vmf->orig_pte is a bit of a hack. So far I'm not sure how to refactor
things to make this cleaner.
I'll take a look.