I am working on a project where we need to handle all the errors between NVMF initiator
and target. Here is a list of errors that I have, but if you know of any other errors,
please tell me. We can assume there are 2 paths to the target.
1. Transport errors. I plan to handle by sending the failed I/O request to alternate path.
These are NVMF RDMA errors which happens in the completion queue for a given namespace. In
this test, my question is how do I handle the cleanup of connection errors on the target?
In the initiator, I have the namespace information and clean it up and send the I/O
request on alternate path. I try to discover the failed path by starting poller thread and
add it to a set of active paths when the connection is up and working.
2. I/O request timeout. I plan to start a poller thread and periodically check if any I/O
requests are pending for more than the timeout period and remove the requests from task
queue which has all the requests.
3. Target error. If there are any connection error which is seen in target, is there a way
to send control messages to the initiator, so that no more requests are sent on the failed
4. Link error. I am investigating on it and want to know if there a way to poll the
mellanox driver and detect link error?
5. I heard of admin queue for using to send control messages between initiator and target,
please let me know how I can use the admin queue to send any control commands to target.
If you have any ideas and suggestions regarding the error handling and recovery for the
questions, please let me know.
Thank you very much for your help,
Show replies by date