On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote:
Perhaps I am not following what Serguei is asking for, but I
understood the desire was for a complex GPU allocator that could
migrate pages between GPU and CPU memory under control of the GPU
driver, among other things. The desire is for DMA to continue to work
even after these migrations happen.
The main issue is to how to solve use cases when p2p is
requested/initiated via CPU pointers where such pointers could
point to non-system memory location e.g. VRAM.
It will allow to provide consistent working model for user to deal only
with pointers (HSA, CUDA, OpenCL 2.0 SVM) as well as provide
performance optimization avoiding double-buffering and extra special code
when dealing with PCIe device memory.
- RDMA Network operations. RDMA MRs where registered memory
could be e.g. VRAM. Currently it is solved using so called PeerDirect
interface which is currently out-of-tree and provided as part of OFED.
- File operations (fread/fwrite) when user wants to transfer file data directly
to/from e.g. VRAM
- Because graphics sub-system must support overcomit (at least each
application/process should independently see all resources) ideally
such memory should be movable without changing CPU pointer value
as well as "paged-out" supporting "page fault" at least on access from
- We must co-exist with existing DRM infrastructure, as well as
support sharing VRAM memory between different processes
- We should be able to deal with large allocations: tens, hundreds of
MBs or may be GBs.
- We may have PCIe devices where p2p may not work
- Potentially any GPU memory should be supported including
memory carved out from system RAM (e.g. allocated via
- In the case of RDMA MRs life-span of "pinning"
(get_user_pages"/put_page) may be defined/controlled by
application not kernel which may be should
treated differently as special case.
Original proposal was to create "struct pages" for VRAM memory
to allow "get_user_pages" to work transparently similar
how it is/was done for "DAX Device" case. Unfortunately
based on my understanding "DAX Device" implementation
deal only with permanently "locked" memory (fixed location)
unrelated to "get_user_pages"/"put_page" scope
which doesn't satisfy requirements for "eviction" / "moving" of
memory keeping CPU address intact.
The desire is for DMA to continue to work
even after these migrations happen
At least some kind of mm notifier callback to
inform about changing
in location (pre- and post-) similar how it is done for system pages.
My understanding is that It will not solve RDMA MR issue where "lock"
could be during the whole application life but (a) it will not make
RDMA MR case worse (b) should be enough for all other cases for
"get_user_pages"/"put_page" controlled by kernel.