Hi Daniel, how are you? Daniel i performed two types of hotplug test using Fedora 25
(kernel 4.8.6-300, VM and host running the same kernel version).
1. Test 1 (Running hotplug.sh in once in a new VM): I created a Fedora25 image for
the VM and used it to test hotplug.sh with no modification and it worked fine. With no
modification, hotplug.sh performs the test on a newly installed VM every time and destroys
the VM at the end. While this is ok, i think we should also run the test in an environment
where the VM continue to run while NVMe hotplug operation is performed several times. This
is a practical scenario at the customer environment so I tried to test it in Test 2
below.
2. Test 2 (Running hotplug.sh several times while VM continue to run): I created
this small script to start the VM (Fedora25) on a Fedora 25 host, kernel-4.8.6. I
commented out the portion that creates and destroys the VM in hotplug.sh so that the VM
will keep running after a hotplug.sh run.
[
[email protected] ~]# cat c_kvm_with_qemu-2-7.1_helper_33_fedora.sh
#!/bin/bash
base_img=/home/fedora25/fedora25-1.img
IMAGE=/home/fedora25/test_fedora25-1.img
rm -f $IMAGE
MEM=8192M
qemu-img create -b $base_img -f qcow2 $IMAGE 50G
/usr/bin/qemu-system-x86_64 \
-hda $IMAGE \
-net user,hostfwd=tcp::10022-:22 \
-net nic \
-cpu host \
-m ${MEM} \
-pidfile "/tmp/qemu-2-7-1_pid_fedora.txt" \
--enable-kvm \
-chardev socket,id=mon0,host=localhost,port=4444,ipv4,server,nowait \
-mon chardev=mon0,mode=readline
Then I ran the modified hotpluig.sh (as mentioned above) and this is what I found.
i. The VM oops as it did before in my previous report.
ii. In insert_devices(), if i comment out the "ssh_vm
scripts/setup.sh" to prevent setup.sh from running, it did not oops, however, inserts
of new devices were not found. It appears, the problem may not be the kernel version but
running setup.sh more than once is the issue.
Daniel, each time setup.sh is ran, it writes 1024 to memory on the proc filesystem
(/proc/sys/vm/nr_hugepages) to be used for the allocation of huge pages, could this be the
issue since a prior run had already allocated huge pages?
Isaac
From: SPDK [mailto:
[email protected]] On Behalf Of Isaac Otsiabah
Sent: Friday, May 19, 2017 11:33 AM
To:
[email protected]<mailto:
[email protected]>
Subject: [SPDK] VM oops while testing SPDK hotplug.sh
Daniel. I have done more testing using hotplug.sh and here is why one does not see the
problem by running hotplug.sh as is. When hotplug.sh is run, it always creates a new VM to
run the test on and, destroys the VM when completed. Therefore, it uses a fresh VM all the
time and never gets the chance to do the inserts and remove on a VM that has run the same
test several times (ie. 4 to 5 times) before. When I run hotplug.sh with a fresh VM, it
passes, On the other hand, when I use the same VM that has run the test several times
before, the VM oops, this is the problem. I also think this is the likely scenario the
customer will experience.
I wasn't clear in my earlier emails because i was still determining the cause of the
problem (at the higher level at least). I think you will see the problem if you do this
i. In an xterm window, Create the VM separately (without the -daemon
flag). for example
IMAGE=/home/fedora24/fedora24-2.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
FEDORA_ISO=/tmp/Fedora-Server-dvd-x86_64-24-1.2.iso
qemu_pidfile=/tmp/qemu_pidfile
qemu-2.7.1/x86_64-softmmu/qemu-system-x86_64 \
-hda $IMAGE \
-net user,hostfwd=tcp::10022-:22 \
-net nic \
-cpu host \
-m ${MEM} \
-pidfile "/tmp/qemu_pidfile" \
--enable-kvm \
-chardev socket,id=mon0,host=localhost,port=4444,ipv4,server,nowait \
-mon chardev=mon0,mode=readline \
-cdrom $FEDORA_ISO
ii. Then comment out in hotplug.sh the portion that creates the VM.
iii. Also comment out these 4 lines at the bottom in hotplug.sh to avoid
killing the VM.
qemupid=`cat "$qemu_pidfile" | awk '{printf $0}'`
kill -9 $qemupid
rm "$qemu_pidfile"
rm "$test_img"
iv. Run hotplug.sh (about 5 times and you will see the oops on VM
console)
The host system I am using is a Centos 7.2
Isaac
From: SPDK [mailto:
[email protected]] On Behalf Of Verkamp, Daniel
Sent: Wednesday, May 17, 2017 1:09 PM
To: Storage Performance Development Kit
<
[email protected]<mailto:
[email protected]>>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug
Hi Isaac,
The version of the hotplug script in the repository (test/lib/nvme/hotplug.sh) is the
current version we are running in our automated test pool.
We haven't hit the -net/--netdev issue that you mentioned yet because the version of
qemu we are using is older (the current host system running this test is Fedora 25 with
qemu 2.7.1). It looks like we'll need to update the script for that. We would be
happy to accept a patch to hotplug.sh if the --netdev option also works on older qemu.
If the kernel crashes due to user program behavior, it sounds like there is a bug in the
kernel uio driver. We haven't seen this crash in our automated testing, so I am not
sure what the cause could be. It is also worth trying a newer kernel version (we are just
using Linux 4.5.5 because the test VM image hasn't been updated in a while).
-- Daniel
From: SPDK [mailto:
[email protected]] On Behalf Of Isaac Otsiabah
Sent: Monday, May 15, 2017 1:43 PM
To: Storage Performance Development Kit
<
[email protected]<mailto:
[email protected]>>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug
Daniel, please, do you have an updated version of the hotplug.sh script you can share with
us? I created the VM using this exact command on my Centos 7 host
IMAGE=/home/fedora24/fedora24.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
FEDORA_ISO=/tmp/Fedora-Server-dvd-x86_64-24-1.2.iso
/tmp/qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
-hda $IMAGE \
-net nic,model=virtio \
-net bridge,br=br1 \
-netdev user,id=hotplug,hostfwd=tcp::10022-:22 \
-m ${MEM} \
-pidfile "/tmp/qemu_pid_fedora.txt" \
--enable-kvm \
-cpu host \
-chardev socket,id=mon0,host=localhost,port=4444,ipv4,server,nowait \
-mon chardev=mon0,mode=readline \
-cdrom $FEDORA_ISO
After the install is complete, I setup the guest IP address in
/etc/sysconfig/network-scripts/ifcfg-ens3 and brings up the interface with ./ifup ens3
From the VM, I clone spdk and build.
Then I run the group of
test in hotplug.sh skipping the VM creation and copying spdk to the VM sections.
I mentioned the -netdev flag in my earlier email
Isaac
From: Isaac Otsiabah
Sent: Monday, May 15, 2017 1:08 PM
To: Storage Performance Development Kit
<
[email protected]<mailto:
[email protected]>>
Cc: Isaac Otsiabah
<
[email protected]<mailto:
[email protected]>>; Paul Von-Stamwitz
<
[email protected]<mailto:
[email protected]>>; Edward Yang
<
[email protected]<mailto:
[email protected]>>
Subject: RE: VM crashes while testing SPDK hotplug
Daniel, i installed a Fedora 24 VM and test it. After running the test twice or more, the
VM oops. Unlike the previous failure on Centos, this failure does not reboot but VM oops
after two or more test run. My host is a Centos machine. I found the qemu-kvm which
comes with the OS installation does not support nvme so I build qemu-system-x86_64 version
2.9.
[
[email protected] spdk]# /tmp/qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 -version
QEMU emulator version 2.9.0
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
One observation on (although this is not the problem because I executed the
scripts/setup.sh and the hotplug binary from vm console during appropriate breakpoints
because local port 10022 was not responsive), the hotplug.h has the flag "-net
user,hostfwd=tcp::10022-:22 \" to redirect gust ssh port 22 to host port 10022.
However, qemu-system-x86_64 version 2.9 does not have this option but it has -netdev
option but is is different. The qemu-system-86_64 man page on -netdev flag is as
follows:
-netdev user,id=str[,ipv4[=on|off]][,net=addr[/mask]][,host=addr]
[,ipv6[=on|off]][,ipv6-net=addr[/int]][,ipv6-host=addr]
[,restrict=on|off][,hostname=host][,dhcpstart=addr]
[,dns=addr][,ipv6-dns=addr][,dnssearch=domain][,tftp=dir]
[,bootfile=f][,hostfwd=rule][,guestfwd=rule][,smb=dir[,smbserver=addr]]
configure a user mode network backend with ID 'str',
its DHCP server and optional services
It says hostfwd=rule and does not give detail of the rule. I used tcp so I specified it
as
-netdev user,id=hotplug,hostfwd=tcp::10022-:22 \
From the host "netstat -an |egrep -I listen |less" I see
local port 10022 is being listened on. I installed sshpass and tested this -netdev flag
redirection with a simple sshpass command to the vm but got no response. Therefore, i
bypassed executing scripts/setup.sh and the hotplug binary using sshpass command.
So I can test it without executing setup.sh and the hotplug binary through sshpass on port
10022. The main issue is why does it oops after I run it 2 or more times?
Isaac
From: SPDK [mailto:
[email protected]] On Behalf Of Verkamp, Daniel
Sent: Tuesday, May 09, 2017 3:33 PM
To: Storage Performance Development Kit
<
[email protected]<mailto:
[email protected]>>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug
Hi Isaac,
Our hotplug tests with a VM (test/lib/nvme/hotplug.sh) are working with a Fedora 24 VM
guest running kernel 4.5.5. I suspect there is a bug in the CentOS kernel version (3.10
is fairly old and is probably missing uio/hotplug-related bug fixes from the mainline
kernels).
Can you try to reproduce your problem on a newer kernel version and see if that is the
cause of the issue?
Thanks,
-- Daniel
From: SPDK [mailto:
[email protected]] On Behalf Of Isaac Otsiabah
Sent: Tuesday, May 9, 2017 2:11 PM
To:
[email protected]<mailto:
[email protected]>
Subject: [SPDK] VM crashes while testing SPDK hotplug
I created a VM on a Centos 7 with a listening socket on port 4449 and tested the hotplug.
1. VM creation is as follows
IMAGE=/home/centos7/centos72.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
ISO=/tmp/CentOS-7-x86_64-Everything-1611.iso
[
[email protected]]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[
[email protected]]# ls -l /tmp/CentOS-7-x86_64-Everything-1611.iso
-r--------. 1 qemu qemu 8280604672 Apr 12 13:37 /tmp/CentOS-7-x86_64-Everything-1611.iso
qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
-hda $IMAGE \
-net nic,model=virtio \
-net bridge,br=br1 \
-m ${MEM} \
-pidfile "/tmp/qemu_pid2.txt" \
--enable-kvm \
-cpu host \
-chardev socket,id=mon0,host=localhost,port=4449,ipv4,server,nowait \
-mon chardev=mon0,mode=readline \
-cdrom $ISO
2. Without running the SPDK ( ie. examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8
), the qemu commands to insert fake nvme devices work, i can see the nvme devices in
/dev/
echo " drive_add 0 file=/root/test0,format=raw,id=drive0,if=none" | nc
localhost 4449
echo " drive_add 1 file=/root/test1,format=raw,id=drive1,if=none" | nc
localhost 4449
echo "drive_add 2 file=/root/test2,format=raw,id=drive2,if=none" | nc
localhost 4449
echo "drive_add 3 file=/root/test3,format=raw,id=drive3,if=none" | nc
localhost 4449
echo "device_add nvme,drive=drive0,id=nvme0,serial=nvme0" |nc
localhost 4449
echo "device_add nvme,drive=drive1,id=nvme1,serial=nvme1" |nc localhost
4449
echo "device_add nvme,drive=drive2,id=nvme2,serial=nvme2" |nc localhost
4449
echo "device_add nvme,drive=drive3,id=nvme3,serial=nvme3" |nc localhost
4449
Also, commands to delete the devices work without crashing the VM
echo "device_del nvme0" | nc localhost 4449
echo "device_del nvme1" | nc localhost 4449
echo "device_del nvme2" | nc localhost 4449
echo "device_del nvme3" | nc localhost 4449
3. However, with the SPDK hotplug test application (examples/nvme/hotplug/hotplug -i 0
-t 15 -n 4 -r 8), the device_del command causes a fault and crashes the VM and it reboot
as a result. /var/log/message and I created a VM on a Centos 7 with a listening socket on
port 4449 and tested the hotplug.
1. VM creation is as follows
IMAGE=/home/centos7/centos72.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
ISO=/tmp/CentOS-7-x86_64-Everything-1611.iso
[
[email protected]]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[
[email protected]]# ls -l /tmp/CentOS-7-x86_64-Everything-1611.iso
-r--------. 1 qemu qemu 8280604672 Apr 12 13:37 /tmp/CentOS-7-x86_64-Everything-1611.iso
qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
-hda $IMAGE \
-net nic,model=virtio \
-net bridge,br=br1 \
-m ${MEM} \
-pidfile "/tmp/qemu_pid2.txt" \
--enable-kvm \
-cpu host \
-chardev socket,id=mon0,host=localhost,port=4449,ipv4,server,nowait \
-mon chardev=mon0,mode=readline \
-cdrom $ISO
2. Without running the SPDK ( ie. examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8
), the qemu commands to insert fake nvme devices work, i can see the nvme devices in
/dev/
echo " drive_add 0 file=/root/test0,format=raw,id=drive0,if=none" | nc
localhost 4449
echo " drive_add 1 file=/root/test1,format=raw,id=drive1,if=none" | nc
localhost 4449
echo "drive_add 2 file=/root/test2,format=raw,id=drive2,if=none" | nc
localhost 4449
echo "drive_add 3 file=/root/test3,format=raw,id=drive3,if=none" | nc
localhost 4449
echo "device_add nvme,drive=drive0,id=nvme0,serial=nvme0" |nc
localhost 4449
echo "device_add nvme,drive=drive1,id=nvme1,serial=nvme1" |nc localhost
4449
echo "device_add nvme,drive=drive2,id=nvme2,serial=nvme2" |nc localhost
4449
echo "device_add nvme,drive=drive3,id=nvme3,serial=nvme3" |nc localhost
4449
Also, commands to delete the devices work without crashing the VM
echo "device_del nvme0" | nc localhost 4449
echo "device_del nvme1" | nc localhost 4449
echo "device_del nvme2" | nc localhost 4449
echo "device_del nvme3" | nc localhost 4449
3. However, with the SPDK hotplug test application (examples/nvme/hotplug/hotplug -i 0
-t 15 -n 4 -r 8), the device_del command causes a fault and crashes the VM and it reboot
as a result. The /var/log/message and vmcore-dmesg.txt files are in the attached tar file.
Would appreciate any help in why a bug in SPDK crashes the VM. Thanks.
Isaac