On Mon, Nov 28, 2016 at 06:19:40PM +0000, Haggai Eran wrote:
> > GPU memory. We create a non-ODP MR pointing to VRAM but
> > user-space and the GPU not to migrate it. If they do, the MR gets
> > destroyed immediately.
> That sounds horrible. How can that possibly work? What if the MR is
> being used when the GPU decides to migrate?
Naturally this doesn't support migration. The GPU is expected to pin
these pages as long as the MR lives. The MR invalidation is done only as
a last resort to keep system correctness.
That just forces applications to handle horrible unexpected
failures. If this sort of thing is needed for correctness then OOM
kill the offending process, don't corrupt its operation.
I think it is similar to how non-ODP MRs rely on user-space today to
keep them correct. If you do something like madvise(MADV_DONTNEED) on a
non-ODP MR's pages, you can still get yourself into a data corruption
situation (HCA sees one page and the process sees another for the same
virtual address). The pinning that we use only guarentees the HCA's page
won't be reused.
That is not really data corruption - the data still goes where it was
originally destined. That is an application violating the
requirements of a MR. An application cannot munmap/mremap a VMA
while a non ODP MR points to it and then keep using the MR.
That is totally different from a GPU driver wanthing to mess with
translation to physical pages.
> From what I understand we are not really talking about kernel
> everything proposed so far is being mediated by a userspace VMA, so
> I'd focus on making that work.
Fair enough, although we will need both eventually, and I hope the
infrastructure can be shared to some degree.
What use case do you see for in kernel?
Presumably in-kernel could use a vmap or something and the same basic