Hi,
This is about multipath failover support (nvme over tcp) on initiator with v22.01-pre git
sha1 be2d126fd / DPDK 21.08.0.
I was trying to test the failover scenario wherein my fio (running on the spdk initiator,
accessing a remote nvme device from
a target over TCP) is suppossed to
continue running even when the primary path to the volume/device goes down.
I need suggestions/help in finding
0. if v22.01-pre (git sha1 be2d126fd) has functional failover support on spdk initiator.
1. what is wrong in the bdev_new.conf below wrt the multipath and timeouts
2. is there any example configuration demostrating successful failover when initiator is
on spdk.
Details at the bottom of this mail.
Thanks
Jitendra
As the following nvme discover command shows, there were two paths to reach the volume
with traddr_ip1 and traddr_ip2.
The path via traddr_ip1 is the primary path and the path via traddr_ip2 is the secondary
path. Both the paths were live
initially as shown by nvme list-subsys.
nvme discover -t tcp -a <discovery controller ip> -q <hostnqn>
Discovery Log Number of Records 2, Generation counter 1
=====Discovery Log Entry 0======
trtype: tcp
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: <subnqn>
traddr: traddr_ip1
sectype: none
=====Discovery Log Entry 1======
trtype: tcp
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: <subnqn>
traddr: traddr_ip2
sectype: none
Using this info from the nvme discover command, I wrote the bdev_new.conf as the
following:
cat mypath/jlulla/spdk/build/fio/bdev_new.conf
{
"subsystems": [
{
"subsystem": "bdev",
"config": [
{
"method": "bdev_nvme_set_options", "params": {
"retry_count": 254,
"timeout_us": 255000000,
"keep_alive_timeout_ms": 255000,
"transport_retry_count": 254,
"bdev_retry_count": -1,
"nvme_adminq_poll_period_us": 100000,
"nvme_ioq_poll_period_us": 0,
"action_on_timeout": "reset"
}
},
{
"method": "bdev_nvme_attach_controller", "params":
{
"name": "Nvme0",
"trtype": "tcp",
"traddr": "traddr_ip1",
"trsvcid": "4420",
"subnqn": "<subnqn>",
"adrfam": "IPv4",
"hostnqn": "<hostnqn>",
"multipath": "failover"
}
},
{
"method": "bdev_nvme_attach_controller", "params":
{
"name": "Nvme0",
"trtype": "tcp",
"traddr": "traddr_ip2",
"trsvcid": "4420",
"subnqn": "<subnqn>",
"adrfam": "IPv4",
"multipath": "failover",
"hostnqn": "<hostnqn>"
}
}
]
}
]
}
My fio job looked like the below:, the total bytes to be written were 64GiB.
root@mymachine:~/jlulla/spdk# cat mypath/jlulla/fio_spdk_write_profile.fio
[global]
ioengine=spdk_bdev
spdk_conf=mypath/jlulla/spdk/build/fio/bdev_new.conf
direct=1
thread=1
prio=0
rw=write
bs=4k
numjobs=1
iodepth=16
verify=md5
do_verify=0
size=100%
norandommap
randrepeat=0
group_reporting=1
[job1]
filename=Nvme0n1
Howver, when I ran the fio like below, it ended up in two issues:
jlulla/spdk# LD_PRELOAD=mypath/jlulla/spdk/build/fio/spdk_bdev fio
mypath/jlulla/fio_spdk_write_profile.fio
Issue 1: 32602, Invalid parameters
job1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B,
ioengine=spdk_bdev, iodepth=16
fio-3.18.random_compress_support
Starting 1 thread
[2021-11-10 13:39:29.871918] Starting SPDK v22.01-pre git sha1 be2d126fd / DPDK 21.08.0
initialization...
[2021-11-10 13:39:29.872144] [ DPDK EAL parameters: [2021-11-10 13:39:29.872191] fio
[2021-11-10 13:39:29.872224] --no-shconf [2021-11-10 13:39:29.872259] -c 0x1 [2021-11-10
13:39:29.872287] --log-level=lib.eal:6 [2021-11-10 13:39:29.872318]
--log-level=lib.cryptodev:5 [2021-11-10 13:39:29.872353] --log-level=user1:6 [2021-11-10
13:39:29.872383] --iova-mode=pa [2021-11-10 13:39:29.872411]
--base-virtaddr=0x200000000000 [2021-11-10 13:39:29.872440] --match-allocations
[2021-11-10 13:39:29.872505] --file-prefix=spdk_pid26564 [2021-11-10 13:39:29.872537] ]
TELEMETRY: No legacy callbacks, legacy socket not created
[2021-11-10 13:39:30.077456] accel_engine.c:1012:spdk_accel_engine_initialize: *NOTICE*:
Accel engine initialized to use software engine.
[2021-11-10 13:39:30.482123] json_config.c: 221:rpc_client_poller: *ERROR*: error
response:
{
"code": -32602,
"message": "Invalid parameters"
}
Issue 2: The fio started wriiting on the device and after some time when the primary path
was made unavailable
2A: the fio immediately started the following, but I was expecting it to continue writing.
Jobs: 1 (f=1): [W(1)][48.4%][eta 06m:04s]
Jobs: 1 (f=1): [W(1)][48.4%][eta 06m:12s]
Jobs: 1 (f=1): [W(1)][48.4%][eta 06m:50s]
Jobs: 1 (f=1): [W(1)][48.4%][eta 07m:29s]
Jobs: 1 (f=1): [W(1)][48.5%][eta 11m:08s]
2B: Eventaully, fio ended up showing Input/output error.
[2021-11-10 13:50:07.876969] bdev_nvme.c:2361:timeout_cb: *WARNING*: Warning: Detected a
timeout. ctrlr=0x7f7c800d6bd0 qpair=(nil) cid=5
[2021-11-10 13:50:07.877344] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*:
aborting queued i/o
[2021-11-10 13:50:07.877432] nvme_qpair.c: 537:nvme_qpair_manual_complete_request:
*NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877449] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*:
WRITE sqid:1 cid:0 nsid:1 lba:8117663 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877461] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*:
ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877480] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*:
aborting queued i/o
[2021-11-10 13:50:07.877496] nvme_qpair.c: 537:nvme_qpair_manual_complete_request:
*NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877506] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*:
WRITE sqid:1 cid:0 nsid:1 lba:8117664 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877516] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*:
ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877525] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*:
aborting queued i/o
[2021-11-10 13:50:07.877535] nvme_qpair.c: 537:nvme_qpair_manual_complete_request:
*NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877544] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*:
WRITE sqid:1 cid:0 nsid:1 lba:8117665 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877553] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*:
ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877563] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*:
aborting queued i/o
[2021-11-10 13:50:07.877572] nvme_qpair.c: 537:nvme_qpair_manual_complete_request:
*NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877582] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*:
WRITE sqid:1 cid:0 nsid:1 lba:8117666 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877591] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*:
ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877600] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*:
aborting queued i/o
[2021-11-10 13:50:07.877609] nvme_qpair.c: 537:nvme_qpair_manual_complete_request:
*NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877618] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*:
WRITE sqid:1 cid:0 nsid:1 lba:8117667 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877637] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*:
ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877647] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*:
aborting queued i/o
[2021-11-10 13:50:07.877656] nvme_qpair.c: 537:nvme_qpair_manual_complete_request:
*NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877666] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*:
WRITE sqid:1 cid:0 nsid:1 lba:8117668 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877676] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*:
ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877684] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*:
aborting queued i/o
[2021-11-10 13:50:07.877695] nvme_qpair.c: 537:nvme_qpair_manual_complete_request:
*NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877705] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*:
WRITE sqid:1 cid:0 nsid:1 lba:8117669 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877714] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*:
ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877724] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*:
aborting queued i/o
[2021-11-10 13:50:07.877732] nvme_qpair.c: 537:nvme_qpair_manual_complete_request:
*NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877749] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*:
WRITE sqid:1 cid:0 nsid:1 lba:8117670 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877759] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*:
ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
fio: io_u error on file Nvme0n1: Input/output error: write offset=33249947648,
buflen=4096
fio: io_u error on file Nvme0n1: Input/output error: write offset=33249951744, buflen=4096