Thanks for your enlightening suggestions,
1. we are afraid that we can not do secure erasing currently...The device is shared and we
need coordinating with other fellows before erasing.
2. we did not considerate the influence of the device state deeply, and we think you make
We will provide a new test report for the FOB state to verify your suggestion.
3. our test demo is based on the spdk/examples/nvme/hello_world/hello_world.c，and the
granularity of I/O is set as 4K.
4. thanks for your knowledge sharing^_^
At 2017-08-09 13:34:17, "Crane Chu" <cranechu(a)gmail.com> wrote:
Here are some of my suggestion:
try Secure Erase before the test. It will eliminate most of the FTL background operations,
and put SSD into so-called FOB state.
most of the enterprise-grade SSD provides Quality of Service in speications. For example,
P3608 ensures 99.99% 4K write IO complete in 4.7 ms when QD=1. Spec also provides another
write latency value like 20us. You can consider them as worst case and best case
respectively. But the distribution between them is hard to predict in non-FOB state, due
to the diversity of firmware, host working load pattern, and NAND Flash characteristics.
mapping granularity is 4K in most of the SSD firmware design. So, if LBA size is 512B in
your test, it will make scenario more complex. Try to test with 4K aligned IO.
On Mon, Aug 7, 2017 at 2:40 PM, 储 <cjj25233(a)163.com> wrote:
We truly appreciate your concrete analysis and help in resolving the problem.
As you say, Different devices have their different and complex mechanisms.
We do not know how the hardware processes each request actually, but we think you make
Maybe it will be relieved with the hardware upgrade and development.
We hope that we can touch other devices and verify whether this problem exists in the
We also expect other fellows using other devices could share the test results about this
Thanks a lot.
At 2017-08-04 07:36:07, "Walker, Benjamin" <benjamin.walker(a)intel.com>
On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
> (1) "access" = write. We experiment read and write operations
> but only find the strange phenomenon in the writing experiments.
> The comparison of experiments can be seen in accessories.
> (2) We use a NAND based SSD, Intel P3608.
> (3) The result presented in accessories is produced with no delay.
> We try to set "sleep(1)" between the two operations, but it seems
> not work.
> At 2017-08-02 07:49:13, "Walker, Benjamin"
> > Hi Jiajia,
> > I have a bunch of questions that will help me figure out what you are
> > seeing.
> > 1) When you say "access", do you mean read or write? The behavior of
> > two operations is quite different.
> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
> > entirely differently.
> > 3) When you access the same block repeatedly, what's the delay between each
> > access? None?
I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
able to give you an exact answer for your particular device - I don't have
insight into the specifics of how each SSD is implemented. I brainstormed with a
few of my colleagues though, so what I can do is give you some idea of what is
happening inside of the device that will make it clear why writing to the same
block over and over may cause performance problems.
A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
write to any LBA, it just appends to the end of the log and updates an internal
map of the location of that LBA. It does this appending by buffering several
writes into RAM located on the SSD, then it sends that batch of data to the NAND
all at once. The other important understanding is that the SSD is composed of a
large number of physical NAND dies, with some number of entirely parallel NAND
channels that can handle writes. Writing to the log sends the batched data to
each channel more or less round-robin. The final thing to remember is that this
whole process is implemented in hardware, not software, so adding things like
coordination between parallel operations is not as simple as just adding a lock.
When you write the same LBA over and over, a few things could happen inside the
SSD (I don't know how your SSD specifically works).
One possibility is that the SSD could see that the LBA is already buffered in
memory from a previous write and it could just update that memory. However, that
doesn't actually work in general. The data in that memory buffer may be
currently in use as part of a write to actual NAND, or may even be currently
being read. So the only option is to append to the end of the log for each new
write to the LBA. This could probably be coordinated with locking in software,
but remember that the SSD controller is implemented in hardware. If handling
this case makes the design far more complex, it may not be possible given power,
latency, and other budgets.
Another possibility is that the data is appended to the log for each write just
like any other I/O. However, it is still more complicated than the case where
random LBAs are being written to. Once one buffer is filled up, a write to NAND
is issued. When that write completes, it has to update the map for the location
of the LBA. If, while that write is outstanding, another buffer fills up with
new writes to the same LBA, the device has to figure out what to do. If it
submits the second NAND write to a new channel, it's then effectively racing
against the first write. If they complete out of order, the user will end up
with stale data. This case could also probably be handled by better coordination
on the completion side, but again there is a complexity trade off when
implementing this in actual hardware.
The easiest solution is probably to just detect if a NAND write is active for an
LBA in a given buffer, and then just queue up the next write until the one
before it finishes. That adds potentially a lot of latency, but it simplifies
the hardware design considerably.
Ultimately, I have no idea what that SSD is actually doing, but you can see that
it's fairly complex to handle this case. It is certainly more complex than
handling random I/O.
I hope that helps,
SPDK mailing list