SPDK environment initialization from a DPDK application
by Nalla, Pradeep
Hello
I got a requirement of a DPDK application that forwards packets and also handle NVMe commands. As the DPDK's EAL is already initialized
by the forwarding application, calling rte_eal_init again in the spdk_env_init is causing issue with rte_error being set to EALREADY. Is there
a way for the env_dpdk library of SPDK to consider an already initialized DPDK.
Thanks
Pradeep.
3 years, 8 months
For the NVMe-oF TCP transport in SPDK
by Yang, Ziye
Hi all,
The NVMe TCP transport is just merged in SPDK master branch, and there are still some works to harden and make it better. For example, currently we need do lots of interoperability test with Linux kernel host/target to guarantee the functionality first . If you submit patches related with this TCP transport, could you also add me as the reviewer (You can search my name in review.gerrithub.io)? Then I will not miss your patch (though I will keep looking at the patches with some filters).
Thanks.
Best Regards
Ziye Yang
3 years, 8 months
Re: [SPDK] nvmf_tgt seg fault
by Harris, James R
Thanks for the report Joe. Could you file an issue in GitHub for this?
https://github.com/spdk/spdk/issues
Thanks,
-Jim
On 11/14/18, 12:14 PM, "SPDK on behalf of Gruher, Joseph R" <spdk-bounces(a)lists.01.org on behalf of joseph.r.gruher(a)intel.com> wrote:
Hi everyone-
I'm running a dual socket Skylake server with P4510 NVMe and 100Gb Mellanox CX4 NIC. OS is Ubuntu 18.04 with kernel 4.18.16. SPDK version is 18.10, FIO version is 3.12. I'm running the SPDK NVMeoF target and exercising it from an initiator system (similar config to the target but with 50Gb NIC) using FIO with the bdev plugin. I find 128K sequential workloads reliably and immediately seg fault nvmf_tgt. I can run 4KB random workloads without experiencing the seg fault, so the problem seems tied to the block size and/or IO pattern. I can run the same IO pattern against a local PCIe device using SPDK without a problem, I only see the failure when running the NVMeoF target with FIO running the IO patter from an SPDK initiator system.
Steps to reproduce and seg fault output follow below.
Start the target:
sudo ~/install/spdk/app/nvmf_tgt/nvmf_tgt -m 0x0000F0 -r /var/tmp/spdk1.sock
Configure the target:
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d1 -t pcie -a 0000:1a:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d2 -t pcie -a 0000:1b:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d3 -t pcie -a 0000:1c:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d4 -t pcie -a 0000:1d:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d5 -t pcie -a 0000:3d:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d6 -t pcie -a 0000:3e:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d7 -t pcie -a 0000:3f:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_nvme_bdev -b d8 -t pcie -a 0000:40:00.0
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_raid_bdev -n raid1 -s 4 -r 0 -b "d1n1 d2n1 d3n1 d4n1 d5n1 d6n1 d7n1 d8n1"
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_store raid1 store1
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l1 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l2 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l3 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l4 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l5 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l6 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l7 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l8 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l9 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l10 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l11 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock construct_lvol_bdev -l store1 l12 1200000
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn1 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn2 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn3 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn4 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn5 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn6 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn7 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn8 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn9 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn10 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn11 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_create nqn.2018-11.io.spdk:nqn12 -a
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn1 store1/l1
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn2 store1/l2
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn3 store1/l3
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn4 store1/l4
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn5 store1/l5
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn6 store1/l6
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn7 store1/l7
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn8 store1/l8
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn9 store1/l9
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn10 store1/l10
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn11 store1/l11
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_ns nqn.2018-11.io.spdk:nqn12 store1/l12
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn1 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn2 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn3 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn4 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn5 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn6 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn7 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn8 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn9 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn10 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn11 -t rdma -a 10.5.0.202 -s 4420
sudo ./rpc.py -s /var/tmp/spdk1.sock nvmf_subsystem_add_listener nqn.2018-11.io.spdk:nqn12 -t rdma -a 10.5.0.202 -s 4420
FIO file on initiator:
[global]
rw=rw
rwmixread=100
numjobs=1
iodepth=32
bs=128k
direct=1
thread=1
time_based=1
ramp_time=10
runtime=10
ioengine=spdk_bdev
spdk_conf=/home/don/fio/nvmeof.conf
group_reporting=1
unified_rw_reporting=1
exitall=1
randrepeat=0
norandommap=1
cpus_allowed_policy=split
cpus_allowed=1-2
[job1]
filename=b0n1
Config file on initiator:
[Nvme]
TransportID "trtype:RDMA traddr:10.5.0.202 trsvcid:4420 subnqn:nqn.2018-11.io.spdk:nqn1 adrfam:IPv4" b0
Run FIO on initiator and nvmf_tgt seg faults immediate:
sudo LD_PRELOAD=/home/don/install/spdk/examples/bdev/fio_plugin/fio_plugin fio sr.ini
Seg fault looks like this:
mlx5: donsl202: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000001 00000000 00000000 00000000
00000000 9d005304 0800011b 0008d0d2
rdma.c:2698:spdk_nvmf_rdma_poller_poll: *WARNING*: CQ error on CQ 0x7f079c01d170, Request 0x139670660105216 (4): local protection error
rdma.c: 501:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#1 changed to: IBV_QPS_ERR
rdma.c:2698:spdk_nvmf_rdma_poller_poll: *WARNING*: CQ error on CQ 0x7f079c01d170, Request 0x139670660105216 (5): Work Request Flushed Error
rdma.c: 501:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#1 changed to: IBV_QPS_ERR
rdma.c:2698:spdk_nvmf_rdma_poller_poll: *WARNING*: CQ error on CQ 0x7f079c01d170, Request 0x139670660106280 (5): Work Request Flushed Error
rdma.c: 501:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#1 changed to: IBV_QPS_ERR
rdma.c:2698:spdk_nvmf_rdma_poller_poll: *WARNING*: CQ error on CQ 0x7f079c01d170, Request 0x139670660106280 (5): Work Request Flushed Error
Segmentation fault
Adds this to dmesg:
[71561.859644] nvme nvme1: Connect rejected: status 8 (invalid service ID).
[71561.866466] nvme nvme1: rdma connection establishment failed (-104)
[71567.805288] reactor_7[9166]: segfault at 88 ip 00005630621e6580 sp 00007f07af5fc400 error 4 in nvmf_tgt[563062194000+df000]
[71567.805293] Code: 48 8b 30 e8 82 f7 ff ff e9 7d fe ff ff 0f 1f 44 00 00 41 81 f9 80 00 00 00 75 37 49 8b 07 4c 8b 70 40 48 c7 40 50 00 00 00 00 <49> 8b 96 88 00 00 00 48 89 50 58 49 8b 96 88 00 00 00 48 89 02 48
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
3 years, 8 months
Community meeting and use of new conf tool
by Luse, Paul E
FYI we had 19 people on this morning's call and nobody had any issues connecting, with audio, with sharing or with random music playing in the background (LOL).
After we try this in next week's Asia time one call we'll officially declare it out new conf tool and update the website
Thanks!!!
Paul
3 years, 8 months
2nd REMINDER - we're not using WebEx for this morning's Community Meeting Call
by Luse, Paul E
See below...
From: Luse, Paul E
Sent: Monday, November 26, 2018 4:42 PM
To: 'Storage Performance Development Kit' <spdk(a)lists.01.org>
Subject: *** CONF INFO FOR COMMUNITY MEETING TOMORROW ***
All,
I'll send out a few reminders for tomorrow's call since a couple of folks missed them last time. We're not "officially" changing over just yet because we couldn't try the service last week with the Asia call because it was cancelled due to the holiday in the US.
Anyway, below is the info for tomorrow's call. I'll put it on IRC as well.
Thanks!
Paul
Tue Nov 27 Euro Call:
Access code: 652734#
Online meeting ID: paul_e_luse
Join the online meeting: https://join.freeconferencecall.com/paul_e_luse
3 years, 8 months
For the Chandler testing pool
by Yang, Ziye
Hi all,
These days, I found that the Chandler testing pool is very busy. When in PRC time zone, there are lots of patches are waiting to be tested. For example today, from now, I see 27 patches are in the list. According to the testing process, each will waste 12 minutes, it means that the total time will be about 324 minutes, which means that if you submit a patch this time, you need to wait at least 5.5 hours (after 14:30, you can see the results). Since we are global team, Poland team will nearly start work at 4:00 PM. So it left less time for PRC in the daily time. Usually, I will get the results of my patches after my working time. So my question is that: Can we reduce this time since currently the testing time is too long?
PS: Why we need the Chandler test pool? Since it has different environments, you cannot verify your patch in the single local test environment.
Thanks.
Best Regards
Ziye Yang
3 years, 8 months
retrigger inconsistencies
by Luse, Paul E
Last week Jim had mentioned a few cases where Jenkins failed to pick up a retrigger and it seems like those might be understood (missing an event, need to use REST API to poll). I ran into one today on the CH TP and although we're going to retire that system here soon it might be good to understand how this one was missed as I was under the impression that it was already polling GH. See https://review.gerrithub.io/c/spdk/spdk/+/433727
Thanks!
Paul
3 years, 8 months
*** CONF INFO FOR COMMUNITY MEETING TOMORROW ***
by Luse, Paul E
All,
I'll send out a few reminders for tomorrow's call since a couple of folks missed them last time. We're not "officially" changing over just yet because we couldn't try the service last week with the Asia call because it was cancelled due to the holiday in the US.
Anyway, below is the info for tomorrow's call. I'll put it on IRC as well.
Thanks!
Paul
Tue Nov 27 Euro Call:
Access code: 652734#
Online meeting ID: paul_e_luse
Join the online meeting: https://join.freeconferencecall.com/paul_e_luse
3 years, 8 months