On Wed, Nov 23, 2016 at 02:58:38PM -0500, Serguei Sagalovitch wrote:
We do not want to have "highly" dynamic translation due
performance cost. We need to support "overcommit" but would
like to minimize impact. To support RDMA MRs for GPU/VRAM/PCIe
device memory (which is must) we need either globally force
pinning for the scope of "get_user_pages() / "put_pages" or have
special handling for RDMA MRs and similar cases.
As I said, there is no possible special handling. Standard IB hardware
does not support changing the DMA address once a MR is created. Forget
about doing that.
Only ODP hardware allows changing the DMA address on the fly, and it
works at the page table level. We do not need special handling for
Generally it could be difficult to correctly handle "DMA in
progress" due to the facts that (a) DMA could originate from
numerous PCIe devices simultaneously including requests to
receive network data.
We handle all of this today in kernel via the page pinning mechanism.
This needs to be copied into peer-peer memory and GPU memory schemes
as well. A pinned page means the DMA address channot be changed and
there is active non-CPU access to it.
Any hardware that does not support page table mirroring must go this
(b) in HSA case DMA could originated from user space without kernel
driver knowledge. So without corresponding h/w support
everywhere I do not see how it could be solved effectively.
All true user triggered DMA must go through some kind of coherent page
table mirroring scheme (eg this is what CAPI does, presumably AMDs HSA
is similar). A page table mirroring scheme is basically the same as
what ODP does.
Like I said, this is the direction the industry seems to be moving in,
so any solution here should focus on VMAs/page tables as the way to link
the peer-peer devices.
To me this means at least items #1 and #3 should be removed from