Thanks for trying it on the latest commit. Also, in response to your last question, the
required configuration should be the same for the kernel and SPDK, but it could be
possible for them to run into different issues since SPDK uses the userspace IBV libraries
and will exercise different code paths than the kernel.
I revisited this test on two separate machines (last week I just tried in loopback). Here
are the specs for the two machines:
Fedora 27 kernel 4.18.16
Mellanox ConnectX-4 NIC on each
SPDK commit: a817ccf571f
I was unable to replicate your failure across these two machines. For setting up the
target, I simply followed the instructions on
https://spdk.iodoc/nvmf.html. I used the
perf command as detailed in your original post, just swapping out the IP address for my
own.
On both machines, I configured spdk with the following command ./configure --with-rdma.
Since I am unable to reproduce the error you are seeing, we will have to establish a
baseline for what is different between our setups. I can only identify a few things:
1. We have different versions of the kernel / different linux distros (This may affect the
ibverbs kernel code and the userspace headers we are using)
2. You are using a switch, I am not. Like you mentioned in your last response, it would be
odd for the kernel to work in this situation and spdk to not work, especially since we are
failing in an ibv call and not anything directly in SPDK. However, we are exercising
different code paths in this part of the stack since we rely on the userspace
headers/libraries while the kernel does all of its processing in the kernel.
3. Possibly the flags we used to configure spdk on the host and target are different (I
really doubt this would cause any issues.)
I still would recommend trying it without the switch just once in order to rule that out
as a possibility. Then we can try looking at other reasons this wouldn't work on your
system.
Thanks,
Seth
-----Original Message-----
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Howell, Seth
Sent: Friday, November 9, 2018 12:45 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org
Subject:
Re: [SPDK] perf failure with RDMA
Hi Chuck,
I know there is some configuration that has to be done when using RDMA across a switch.
Sadly, I don't know exactly what those steps are. You may find some help here
https://community.mellanox.com/docs/DOC-2855 under the section Non-mellanox switch
configuration.
There are a couple of things we can try:
1. Remove the switch to simplify the equation - are the two machines collocated such that
you can just plug the NICs into each other and retry the test?
2. Try it on the latest master - Ben told me we had to revert a patch this morning that
went in last night that caused NVMe-oF issues. This probably isn't related to your
issue, but may be worth trying to cover our bases.
Thanks,
Seth
-----Original Message-----
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Chuck Tuffli
Sent: Friday, November 9, 2018 11:33 AM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] perf failure with RDMA
On Fri, Nov 9, 2018 at 10:22 AM Howell, Seth <seth.howell(a)intel.com> wrote:
Hi Chuck,
We are happy to help you look into this. I am trying to understand
your setup, and have a couple questions.
Are you running the target and initiator on the same server, or do you
have two different machines connected either directly or through a switch?
There are two servers (both Ubuntu 18.04) connected via a switch. One system is running
perf (i.e. the initiator), the other runs nvmf_tgt.
Are you using RDMA enabled NICs, or software emulation through Soft-RoCE?
Both sides are using Mellanox ConnectX-5 cards.
If connected through a switch, how was the switch configured?
There wasn't much configuration other than setting up a VLAN for these two 10g ports.
Is there something in particular I can provide?
Thanks,
> Seth Howell
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Chuck
> Tuffli
> Sent: Friday, November 9, 2018 10:21 AM
> To: spdk(a)lists.01.org
> Subject: [SPDK] perf failure with RDMA
> Using the latest from git (5aace139), I'm able to run
perf to NVMe
> drives over PCIe and run the NVMe-oF target. The Linux kernel NoF
> driver can successfully connect to the spdk target, but perf fails
> when connecting over RDMA:
> # uname -a
> Linux nighthawk01 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09
> UTC
> 2018 x86_64 x86_64 x86_64 GNU/Linux
> # grep -i huge /proc/meminfo
> AnonHugePages: 0 kB
> ShmemHugePages: 0 kB
> HugePages_Total: 4096
> HugePages_Free: 4096
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
> # ./perf -q 4 -o 512 -w randread -r 'trtype:RDMA adrfam:IPV4
> traddr:10.0.0.14 trsvcid:4420' -t 60
> Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> --base-virtaddr=0x200000000000 --file-prefix=spdk_pid8083 ]
> EAL: Detected 96 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: No free hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> Initializing NVMe Controllers
> nvme_rdma.c: 276:nvme_rdma_qpair_init: *ERROR*: rdma_create_qp failed
> perf.dbg: rdam_create_qp: Cannot allocate memory
> nvme_rdma.c: 812:nvme_rdma_qpair_connect: *ERROR*:
> nvme_rdma_qpair_init() failed
> nvme_rdma.c:1388:nvme_rdma_ctrlr_construct: *ERROR*: failed to create
> admin qpair
> nvme.c: 523:spdk_nvme_probe_internal: *ERROR*: NVMe ctrlr scan failed
> spdk_nvme_probe() failed for transport address '10.0.0.14'
> ./perf: errors occured
> # ping -c1 10.0.0.14
> PING 10.0.0.14 (10.0.0.14) 56(84) bytes of data.
> 64 bytes from 10.0.0.14: icmp_seq=1 ttl=64 time=0.157 ms
> --- 10.0.0.14 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt
> min/avg/max/mdev = 0.157/0.157/0.157/0.000 ms #
> Should this work? Am I missing a step? Note that I did
modify perf
> only to print the error returned from rdma_create_qp:
> [598] git diff
> diff --git a/lib/nvme/nvme_rdma.c b/lib/nvme/nvme_rdma.c index
> 30560c17..fd688ee3 100644
> --- a/lib/nvme/nvme_rdma.c
> +++ b/lib/nvme/nvme_rdma.c
> @@ -30,7 +30,7 @@
> * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> */
> -
> +/*XXX*/#include <err.h
> /*
> * NVMe over RDMA transport
> */
> @@ -274,6 +274,7 @@ nvme_rdma_qpair_init(struct nvme_rdma_qpair *rqpair)
> rc = rdma_create_qp(rqpair->cm_id, rctrlr->pd, &attr);
> if (rc) {
> SPDK_ERRLOG("rdma_create_qp failed\n");
> + warn("rdam_create_qp");
> return -1;
> }
> rctrlr->pd = rqpair->cm_id->qp->pd;
> ---chuck
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk