From: SPDK [mailto:email@example.com] On Behalf Of Walker,
Sent: Thursday, August 23, 2018 3:34 PM
Subject: Re: [SPDK] RDMA qp recovery: follow up on latest changes
> On Thu, 2018-08-23 at 04:08 +0000, Philipp Skadorov wrote:
> > From: SPDK [mailto:firstname.lastname@example.org] On Behalf Of Walker,
> > Benjamin
> > On Wed, 2018-08-22 at 22:04 +0000, Philipp Skadorov wrote:
> > > Hello Benjamin,
> > > There have been a series of changes to the QP error recovery that
> > > I'd like to follow up:
> > >
> > > ** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
> > ibv_query_qp
> > > results in a write syscall to the SNIC driver which is a context
> > > switch.
> > > The original code was calling it when the state is known to
> > > change; the cached value was used everywhere else.
> > The observation driving this patch was that the cached value was
> > always updated before being used, so keeping the cached value wasn't
> > saving any calls. Do you see anywhere in the code on master where a
> > call to ibv_query_qp could be eliminated?
> I thought it is enough to update the qp state after a QP async event
> is received only.
> Do you see the cases when it is not?
So you're saying we probably don't need the one at the top of
spdk_nvmf_rdma_qpair_recover() nor the one in
_spdk_nvmf_rdma_qp_error() because any unsolicited state change will
occur only along with an associated asynchronous event. I think I agree with
that statement. I'd revert the patch that changed this, but the code has
drifted a bit, so I'll probably have to cook up a new one.
> > The only one I see that might be
> > questionable is the one in _spdk_nvmf_rdma_qp_error.
> > >
> > > ** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained and
> > > spdk_nvmf_rdma_recover
> > > ** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
> > > ** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't
> > > trigger error recovery on IBV_EVENT_SQ_DRAINED De-allocating the
> > > resources associated with the requests being processed by SNIC at
> > > the time of IBV_EVENT_QP_FATAL is too early.
> > > The IBV standard requires SPDK should "wait for the Affiliated
> > > Asynchronous Last WQE Reached Event" before manipulating with the
> > > QP
> > state.
> > > Would also assume it is unsafe to return the associated requests
> > > and their data back to the pools before "Last WQE Reached" event
> > I agree, but I think the code on master does that already with just
> > one twist.
> > We observed that the IBV_EVENT_QP_LAST_WQE_REACHED event
> > occur if an error occured on an RDMA queue pair which had no
> > outstanding I/O. So we can't always wait for that event before
> > releasing resources. Instead, when we first are notified of the
> > error via IBV_EVENT_QP_FATAL, we abort all outstanding commands and
> > then attempt to do the recovery. The recovery quits early if there
> > are RDMA operations outstanding though.
> Looking at rdma.c, func _spdk_nvmf_rdma_qp_error:
> 2085 _spdk_nvmf_rdma_qp_cleanup_all_states(rqpair);
> It seems to be called at all times once qp is in error state.
Is the suggestion here to not call _spdk_nvmf_rdma_qp_cleanup_all_states
until after the check for whether there is I/O outstanding? I'm not sure I'm
interpreting what you're saying correctly, but if that's what you're saying
think I agree with it.
Sorry, I was not clear - let me rephrase it.
Looking at _spdk_nvmf_rdma_qp_cleanup_all_states cleans up the requests of three types:
The requests that could be cleaned up at any time:
BDEV-held requests (that are being processed by bdev layer):
The function _spdk_nvmf_rdma_qp_cleanup_all_states should be split.
* RDMA-held requests should be cleaned up only after IBV_EVENT_QP_LAST_WQE_REACHED.
* BDEV-held requests should be awaited to complete naturally (unless you know how to
* " The requests that could be cleaned up at any time " could be cleaned up at
any time :)
> > In that case, we'll later get the
> > IBV_EVENT_QP_LAST_WQE_REACHED event and go through the same
> > but not bail out early. This was all mostly figured out through
> > trial and error when force disconnecting the NVMe-oF initiator. If
> > you see any flaws in the logic let me know so we can get them
> > corrected.
> > >
> > > Thanks,
> > > Philipp
SPDK mailing list