[LSF/MM TOPIC] The end of the DAX experiment
by Dan Williams
Before people get too excited this isn't a proposal to kill DAX. The
topic proposal is a discussion to resolve lingering open questions
that currently motivate ext4 and xfs to scream "EXPERIMENTAL" when the
current DAX facilities are enabled. The are 2 primary concerns to
resolve. Enumerate the remaining features/fixes, and identify a path
to implement it all without regressing any existing application use
cases.
An enumeration of remaining projects follows, please expand this list
if I missed something:
* "DAX" has no specific meaning by itself, users have 2 use cases for
"DAX" capabilities: userspace cache management via MAP_SYNC, and page
cache avoidance where the latter aspect of DAX has no current api to
discover / use it. The project is to supplement MAP_SYNC with a
MAP_DIRECT facility and MADV_SYNC / MADV_DIRECT to indicate the same
dynamically via madvise. Similar to O_DIRECT, MAP_DIRECT would be an
application hint to avoid / minimiize page cache usage, but no strict
guarantee like what MAP_SYNC provides.
* Resolve all "if (dax) goto fail;" patterns in the kernel. Outside of
longterm-GUP (a topic in its own right) the projects here are
XFS-reflink and XFS-realtime-device support. DAX+reflink effectively
requires a given physical page to be mapped into two different inodes
at different (page->index) offsets. The challenge is to support
DAX-reflink without violating any existing application visible
semantics, the operating assumption / strawman to debate is that
experimental status is not blanket permission to go change existing
semantics in backwards incompatible ways.
* Deprecate, but not remove, the DAX mount option. Too many flows
depend on the option so it will never go away, but the facility is too
coarse. Provide an option to enable MAP_SYNC and
more-likely-to-do-something-useful-MAP_DIRECT on a per-directory
basis. The current proposal is to allow this property to only be
toggled while the directory is empty to avoid the complications of
racing page invalidation with new DAX mappings.
Secondary projects, i.e. important but I would submit are not in the
critical path to removing the "experimental" designation:
* Filesystem-integrated badblock management. Hook up the media error
notifications from libnvdimm to the filesystem to allow for operations
like "list files with media errors" and "enumerate bad file offsets on
a granulatiy smaller than a page". Another consideration along these
lines is to integrate machine-check-handling and dynamic error
notification into a filesystem interface. I've heard complaints that
the sigaction() based mechanism to receive BUS_MCEERR_* information,
while sufficient for the "System RAM" use case, is not precise enough
for the "Persistent Memory / DAX" use case where errors are repairable
and sub-page error information is useful.
* Userfaultfd for file-backed mappings and DAX
Ideally all the usual DAX, persistent memory, and GUP suspects could
be in the room to discuss this:
* Jan Kara
* Dave Chinner
* Christoph Hellwig
* Jeff Moyer
* Johannes Thumshirn
* Matthew Wilcox
* John Hubbard
* Jérôme Glisse
* MM folks for the reflink vs 'struct page' vs Xarray considerations
1 year, 2 months
[RFC v3 00/19] kunit: introduce KUnit, the Linux kernel unit testing framework
by Brendan Higgins
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
Additionally for convenience, I have applied these patches to a branch:
https://kunit.googlesource.com/linux/+/kunit/rfc/4.19/v3
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/4.19/v3 branch.
## Changes Since Last Version
- Changed namespace prefix from `test_*` to `kunit_*` as requested by
Shuah.
- Started converting/cleaning up the device tree unittest to use KUnit.
- Started adding KUnit expectations with custom messages.
--
2.20.0.rc0.387.gc7a69e6b6c-goog
1 year, 5 months
[PATCH] libnvdimm, namespace: check nsblk->uuid immediately after its allocation
by Wei Yang
When creating nd_namespace_blk, its uuid is copied from nd_label->uuid.
In case the memory allocation fails, it goes to the error branch.
This check is better to be done immediately after memory allocation,
while current implementation does this after assigning claim_class.
This patch moves the check immediately after uuid allocation.
Signed-off-by: Wei Yang <richardw.yang(a)linux.intel.com>
---
drivers/nvdimm/namespace_devs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 681af3a8fd62..9471b9ca04f5 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2240,11 +2240,11 @@ static struct device *create_namespace_blk(struct nd_region *nd_region,
nsblk->lbasize = __le64_to_cpu(nd_label->lbasize);
nsblk->uuid = kmemdup(nd_label->uuid, NSLABEL_UUID_LEN,
GFP_KERNEL);
+ if (!nsblk->uuid)
+ goto blk_err;
if (namespace_label_has(ndd, abstraction_guid))
nsblk->common.claim_class
= to_nvdimm_cclass(&nd_label->abstraction_guid);
- if (!nsblk->uuid)
- goto blk_err;
memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
if (name[0])
nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
--
2.19.1
1 year, 8 months
[RFC v4 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework
by Brendan Higgins
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
Additionally for convenience, I have applied these patches to a branch:
https://kunit.googlesource.com/linux/+/kunit/rfc/5.0-rc5/v4
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/5.0-rc5/v4 branch.
## Changes Since Last Version
- Got KUnit working on (hypothetically) all architectures (tested on
x86), as per Rob's (and other's) request
- Punting all KUnit features/patches depending on UML for now.
- Broke out UML specific support into arch/um/* as per "[RFC v3 01/19]
kunit: test: add KUnit test runner core", as requested by Luis.
- Added support to kunit_tool to allow it to build kernels in external
directories, as suggested by Kieran.
- Added a UML defconfig, and a config fragment for KUnit as suggested
by Kieran and Luis.
- Cleaned up, and reformatted a bunch of stuff.
--
2.21.0.rc0.258.g878e2cd30e-goog
1 year, 11 months
[ndctl PATCH v2 4/4] ndctl, monitor: support NVDIMM_FAMILY_HYPERV
by Dexuan Cui
Currently "ndctl monitor" fails for NVDIMM_FAMILY_HYPERV due to
"no smart support".
NVDIMM_FAMILY_HYPERV doesn't use ND_CMD_SMART to get the health info.
Instead, it uses ND_CMD_CALL, so the checking here can't apply,and it
doesn't support threshold alarms -- actually it only supports one event:
ND_EVENT_HEALTH_STATE.
See http://www.uefi.org/RFIC_LIST ("Virtual NVDIMM 0x1901").
Let's skip the unnecessary checking for NVDIMM_FAMILY_HYPERV, and make
sure we only monitor the "dimm-health-state" event and ignore the others.
With the patch, when an error happens, we log it with such a message:
{"timestamp":"1550547497.431731497","pid":1571,"event":
{"dimm-health-state":true},"dimm":{"dev":"nmem1",
"id":"04d5-01-1701-01000000","handle":1,"phys_id":0,
"health":{"health_state":"fatal","shutdown_count":8}}}
Here the meaningful info is:
"health":{"health_state":"fatal","shutdown_count":8}
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
ndctl/monitor.c | 42 +++++++++++++++++++++++++++++++++++-------
1 file changed, 35 insertions(+), 7 deletions(-)
diff --git a/ndctl/monitor.c b/ndctl/monitor.c
index 43b2abe..43beb06 100644
--- a/ndctl/monitor.c
+++ b/ndctl/monitor.c
@@ -265,31 +265,59 @@ static bool filter_region(struct ndctl_region *region,
return true;
}
-static void filter_dimm(struct ndctl_dimm *dimm, struct util_filter_ctx *fctx)
+static bool ndctl_dimm_test_and_enable_notification(struct ndctl_dimm *dimm)
{
- struct monitor_dimm *mdimm;
- struct monitor_filter_arg *mfa = fctx->monitor;
const char *name = ndctl_dimm_get_devname(dimm);
+ /*
+ * Hyper-V Virtual NVDIMM doesn't use ND_CMD_SMART to get the health
+ * info. Instead, it uses ND_CMD_CALL, so the checking here can't
+ * apply, and it doesn't support threshold alarms -- actually it only
+ * supports one event: ND_EVENT_HEALTH_STATE.
+ */
+ if (ndctl_dimm_get_cmd_family(dimm) == NVDIMM_FAMILY_HYPERV) {
+ if (monitor.event_flags != ND_EVENT_HEALTH_STATE) {
+ monitor.event_flags = ND_EVENT_HEALTH_STATE;
+
+ notice(&monitor,
+ "%s: only dimm-health-state can be monitored\n",
+ name);
+ }
+ return true;
+ }
+
if (!ndctl_dimm_is_cmd_supported(dimm, ND_CMD_SMART)) {
err(&monitor, "%s: no smart support\n", name);
- return;
+ return false;
}
if (!ndctl_dimm_is_cmd_supported(dimm, ND_CMD_SMART_THRESHOLD)) {
err(&monitor, "%s: no smart threshold support\n", name);
- return;
+ return false;
}
if (!ndctl_dimm_is_flag_supported(dimm, ND_SMART_ALARM_VALID)) {
err(&monitor, "%s: smart alarm invalid\n", name);
- return;
+ return false;
}
if (enable_dimm_supported_threshold_alarms(dimm)) {
err(&monitor, "%s: enable supported threshold alarms failed\n", name);
- return;
+ return false;
}
+ return true;
+}
+
+static void filter_dimm(struct ndctl_dimm *dimm, struct util_filter_ctx *fctx)
+{
+ struct monitor_dimm *mdimm;
+ struct monitor_filter_arg *mfa = fctx->monitor;
+ const char *name = ndctl_dimm_get_devname(dimm);
+
+
+ if (!ndctl_dimm_test_and_enable_notification(dimm))
+ return;
+
mdimm = calloc(1, sizeof(struct monitor_dimm));
if (!mdimm) {
err(&monitor, "%s: calloc for monitor dimm failed\n", name);
--
2.19.1
1 year, 11 months
[ndctl PATCH v2 3/4] ndctl, lib: implement ndctl_dimm_get_cmd_family()
by Dexuan Cui
Let's export the family info so we can do some family-specific
handling in ndctl/monitor.c for Hyper-V NVDIMM.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
ndctl/lib/libndctl.c | 5 +++++
ndctl/lib/libndctl.sym | 1 +
ndctl/libndctl.h | 1 +
3 files changed, 7 insertions(+)
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index 48bdb27..1186579 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -1550,6 +1550,11 @@ NDCTL_EXPORT struct ndctl_dimm *ndctl_dimm_get_next(struct ndctl_dimm *dimm)
return list_next(&bus->dimms, dimm, list);
}
+NDCTL_EXPORT unsigned long ndctl_dimm_get_cmd_family(struct ndctl_dimm *dimm)
+{
+ return dimm->cmd_family;
+}
+
NDCTL_EXPORT unsigned int ndctl_dimm_get_handle(struct ndctl_dimm *dimm)
{
return dimm->handle;
diff --git a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym
index cb9f769..470e895 100644
--- a/ndctl/lib/libndctl.sym
+++ b/ndctl/lib/libndctl.sym
@@ -38,6 +38,7 @@ global:
ndctl_bus_wait_probe;
ndctl_dimm_get_first;
ndctl_dimm_get_next;
+ ndctl_dimm_get_cmd_family;
ndctl_dimm_get_handle;
ndctl_dimm_get_phys_id;
ndctl_dimm_get_vendor;
diff --git a/ndctl/libndctl.h b/ndctl/libndctl.h
index 0debdb6..cb5a8fc 100644
--- a/ndctl/libndctl.h
+++ b/ndctl/libndctl.h
@@ -145,6 +145,7 @@ struct ndctl_dimm *ndctl_dimm_get_next(struct ndctl_dimm *dimm);
for (dimm = ndctl_dimm_get_first(bus); \
dimm != NULL; \
dimm = ndctl_dimm_get_next(dimm))
+unsigned long ndctl_dimm_get_cmd_family(struct ndctl_dimm *dimm);
unsigned int ndctl_dimm_get_handle(struct ndctl_dimm *dimm);
unsigned short ndctl_dimm_get_phys_id(struct ndctl_dimm *dimm);
unsigned short ndctl_dimm_get_vendor(struct ndctl_dimm *dimm);
--
2.19.1
1 year, 11 months
[ndctl PATCH v2 2/4] libndctl: NVDIMM_FAMILY_HYPERV: add .smart_get_shutdown_count (Function 2)
by Dexuan Cui
With the patch, "ndctl list --dimms --health --idle" can show
"shutdown_count" now, e.g.
{
"dev":"nmem0",
"id":"04d5-01-1701-00000000",
"handle":0,
"phys_id":0,
"health":{
"health_state":"ok",
"shutdown_count":2
}
}
The patch has to directly call ndctl_cmd_submit() in
hyperv_cmd_smart_get_flags() and hyperv_cmd_smart_get_shutdown_count() to
get the needed info, because util_dimm_health_to_json() only submits *one*
command, and unluckily for Hyper-V Virtual NVDIMM we need to call both
Function 1 and 2 to get the needed info.
My feeling is that it's not very good to directly call ndctl_cmd_submit(),
but by doing this we don't need to make any change to the common code, and
I'm unsure if it's good to change the common code just for Hyper-V.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
ndctl/lib/hyperv.c | 62 ++++++++++++++++++++++++++++++++++++++++------
ndctl/lib/hyperv.h | 7 ++++++
2 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/ndctl/lib/hyperv.c b/ndctl/lib/hyperv.c
index b303d50..e8ec142 100644
--- a/ndctl/lib/hyperv.c
+++ b/ndctl/lib/hyperv.c
@@ -22,7 +22,8 @@
#define CMD_HYPERV_STATUS(_c) (CMD_HYPERV(_c)->u.status)
#define CMD_HYPERV_SMART_DATA(_c) (CMD_HYPERV(_c)->u.smart.data)
-static struct ndctl_cmd *hyperv_dimm_cmd_new_smart(struct ndctl_dimm *dimm)
+static struct ndctl_cmd *hyperv_dimm_cmd_new_cmd(struct ndctl_dimm *dimm,
+ unsigned int command)
{
struct ndctl_bus *bus = ndctl_dimm_get_bus(dimm);
struct ndctl_ctx *ctx = ndctl_bus_get_ctx(bus);
@@ -35,8 +36,7 @@ static struct ndctl_cmd *hyperv_dimm_cmd_new_smart(struct ndctl_dimm *dimm)
return NULL;
}
- if (test_dimm_dsm(dimm, ND_HYPERV_CMD_GET_HEALTH_INFO) ==
- DIMM_DSM_UNSUPPORTED) {
+ if (test_dimm_dsm(dimm, command) == DIMM_DSM_UNSUPPORTED) {
dbg(ctx, "unsupported function\n");
return NULL;
}
@@ -54,7 +54,7 @@ static struct ndctl_cmd *hyperv_dimm_cmd_new_smart(struct ndctl_dimm *dimm)
hyperv = CMD_HYPERV(cmd);
hyperv->gen.nd_family = NVDIMM_FAMILY_HYPERV;
- hyperv->gen.nd_command = ND_HYPERV_CMD_GET_HEALTH_INFO;
+ hyperv->gen.nd_command = command;
hyperv->gen.nd_fw_size = 0;
hyperv->gen.nd_size_in = offsetof(struct nd_hyperv_smart, status);
hyperv->gen.nd_size_out = sizeof(hyperv->u.smart);
@@ -65,34 +65,74 @@ static struct ndctl_cmd *hyperv_dimm_cmd_new_smart(struct ndctl_dimm *dimm)
return cmd;
}
-static int hyperv_smart_valid(struct ndctl_cmd *cmd)
+static struct ndctl_cmd *hyperv_dimm_cmd_new_smart(struct ndctl_dimm *dimm)
+{
+ return hyperv_dimm_cmd_new_cmd(dimm, ND_HYPERV_CMD_GET_HEALTH_INFO);
+}
+
+static int hyperv_cmd_valid(struct ndctl_cmd *cmd, unsigned int command)
{
if (cmd->type != ND_CMD_CALL ||
cmd->size != sizeof(*cmd) + sizeof(struct nd_pkg_hyperv) ||
CMD_HYPERV(cmd)->gen.nd_family != NVDIMM_FAMILY_HYPERV ||
- CMD_HYPERV(cmd)->gen.nd_command != ND_HYPERV_CMD_GET_HEALTH_INFO ||
+ CMD_HYPERV(cmd)->gen.nd_command != command ||
cmd->status != 0 ||
CMD_HYPERV_STATUS(cmd) != 0)
return cmd->status < 0 ? cmd->status : -EINVAL;
return 0;
}
+static int hyperv_smart_valid(struct ndctl_cmd *cmd)
+{
+ return hyperv_cmd_valid(cmd, ND_HYPERV_CMD_GET_HEALTH_INFO);
+}
+
static int hyperv_cmd_xlat_firmware_status(struct ndctl_cmd *cmd)
{
return CMD_HYPERV_STATUS(cmd) == 0 ? 0 : -EINVAL;
}
+static int hyperv_get_shutdown_count(struct ndctl_cmd *cmd,
+ unsigned int *count)
+{
+ unsigned int command = ND_HYPERV_CMD_GET_SHUTDOWN_INFO;
+ struct ndctl_cmd *cmd_get_shutdown_info;
+ int rc;
+
+ cmd_get_shutdown_info = hyperv_dimm_cmd_new_cmd(cmd->dimm, command);
+ if (!cmd_get_shutdown_info)
+ return -EINVAL;
+
+ if (ndctl_cmd_submit(cmd_get_shutdown_info) < 0 ||
+ hyperv_cmd_valid(cmd_get_shutdown_info, command) < 0) {
+ rc = -EINVAL;
+ goto out;
+ }
+
+ *count = CMD_HYPERV(cmd_get_shutdown_info)->u.shutdown_info.count;
+ rc = 0;
+out:
+ ndctl_cmd_unref(cmd_get_shutdown_info);
+ return rc;
+}
+
static unsigned int hyperv_cmd_smart_get_flags(struct ndctl_cmd *cmd)
{
int rc;
+ unsigned int count;
+ unsigned int flags = 0;
rc = hyperv_smart_valid(cmd);
if (rc < 0) {
errno = -rc;
return 0;
}
+ flags |= ND_SMART_HEALTH_VALID;
- return ND_SMART_HEALTH_VALID;
+ if (hyperv_get_shutdown_count(cmd, &count) == 0)
+ flags |= ND_SMART_SHUTDOWN_COUNT_VALID;
+
+ return flags;
}
static unsigned int hyperv_cmd_smart_get_health(struct ndctl_cmd *cmd)
@@ -121,9 +161,17 @@ static unsigned int hyperv_cmd_smart_get_health(struct ndctl_cmd *cmd)
return health;
}
+static unsigned int hyperv_cmd_smart_get_shutdown_count(struct ndctl_cmd *cmd)
+{
+ unsigned int count;
+
+ return hyperv_get_shutdown_count(cmd, &count) == 0 ? count : UINT_MAX;
+}
+
struct ndctl_dimm_ops * const hyperv_dimm_ops = &(struct ndctl_dimm_ops) {
.new_smart = hyperv_dimm_cmd_new_smart,
.smart_get_flags = hyperv_cmd_smart_get_flags,
.smart_get_health = hyperv_cmd_smart_get_health,
+ .smart_get_shutdown_count = hyperv_cmd_smart_get_shutdown_count,
.xlat_firmware_status = hyperv_cmd_xlat_firmware_status,
};
diff --git a/ndctl/lib/hyperv.h b/ndctl/lib/hyperv.h
index 8e55a97..5232d60 100644
--- a/ndctl/lib/hyperv.h
+++ b/ndctl/lib/hyperv.h
@@ -19,6 +19,7 @@ enum {
/* non-root commands */
ND_HYPERV_CMD_GET_HEALTH_INFO = 1,
+ ND_HYPERV_CMD_GET_SHUTDOWN_INFO = 2,
};
/*
@@ -38,9 +39,15 @@ struct nd_hyperv_smart {
};
} __attribute__((packed));
+struct nd_hyperv_shutdown_info {
+ __u32 status;
+ __u32 count;
+} __attribute__((packed));
+
union nd_hyperv_cmd {
__u32 status;
struct nd_hyperv_smart smart;
+ struct nd_hyperv_shutdown_info shutdown_info;
} __attribute__((packed));
struct nd_pkg_hyperv {
--
2.19.1
1 year, 11 months
[ndctl PATCH v2 1/4] libndctl: add support for NVDIMM_FAMILY_HYPERV's _DSM Function 1
by Dexuan Cui
This patch retrieves the health info by Hyper-V _DSM method Function 1:
Get Health Information (Function Index 1)
See http://www.uefi.org/RFIC_LIST ("Virtual NVDIMM 0x1901").
Now "ndctl list --dimms --health --idle" can show a line "health_state":"ok",
e.g.
{
"dev":"nmem0",
"id":"04d5-01-1701-00000000",
"handle":0,
"phys_id":0,
"health":{
"health_state":"ok"
}
}
If there is an error with the NVDIMM, the "ok" will be replaced with "unknown",
"fatal", "critical", or "non-critical".
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
ndctl/lib/Makefile.am | 1 +
ndctl/lib/hyperv.c | 129 ++++++++++++++++++++++++++++++++++++++++++
ndctl/lib/hyperv.h | 51 +++++++++++++++++
ndctl/lib/libndctl.c | 2 +
ndctl/lib/private.h | 3 +
ndctl/ndctl.h | 1 +
6 files changed, 187 insertions(+)
create mode 100644 ndctl/lib/hyperv.c
create mode 100644 ndctl/lib/hyperv.h
diff --git a/ndctl/lib/Makefile.am b/ndctl/lib/Makefile.am
index 7797039..fb75fda 100644
--- a/ndctl/lib/Makefile.am
+++ b/ndctl/lib/Makefile.am
@@ -20,6 +20,7 @@ libndctl_la_SOURCES =\
intel.c \
hpe1.c \
msft.c \
+ hyperv.c \
ars.c \
firmware.c \
libndctl.c
diff --git a/ndctl/lib/hyperv.c b/ndctl/lib/hyperv.c
new file mode 100644
index 0000000..b303d50
--- /dev/null
+++ b/ndctl/lib/hyperv.c
@@ -0,0 +1,129 @@
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU Lesser General Public License,
+ * version 2.1, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT ANY
+ * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for
+ * more details.
+ */
+#include <stdlib.h>
+#include <limits.h>
+#include <util/bitmap.h>
+#include <util/log.h>
+#include <ndctl/libndctl.h>
+#include "private.h"
+#include "hyperv.h"
+
+#define CMD_HYPERV(_c) ((_c)->hyperv)
+#define CMD_HYPERV_STATUS(_c) (CMD_HYPERV(_c)->u.status)
+#define CMD_HYPERV_SMART_DATA(_c) (CMD_HYPERV(_c)->u.smart.data)
+
+static struct ndctl_cmd *hyperv_dimm_cmd_new_smart(struct ndctl_dimm *dimm)
+{
+ struct ndctl_bus *bus = ndctl_dimm_get_bus(dimm);
+ struct ndctl_ctx *ctx = ndctl_bus_get_ctx(bus);
+ struct ndctl_cmd *cmd;
+ size_t size;
+ struct nd_pkg_hyperv *hyperv;
+
+ if (!ndctl_dimm_is_cmd_supported(dimm, ND_CMD_CALL)) {
+ dbg(ctx, "unsupported cmd\n");
+ return NULL;
+ }
+
+ if (test_dimm_dsm(dimm, ND_HYPERV_CMD_GET_HEALTH_INFO) ==
+ DIMM_DSM_UNSUPPORTED) {
+ dbg(ctx, "unsupported function\n");
+ return NULL;
+ }
+
+ size = sizeof(*cmd) + sizeof(struct nd_pkg_hyperv);
+ cmd = calloc(1, size);
+ if (!cmd)
+ return NULL;
+
+ cmd->dimm = dimm;
+ ndctl_cmd_ref(cmd);
+ cmd->type = ND_CMD_CALL;
+ cmd->size = size;
+ cmd->status = 1;
+
+ hyperv = CMD_HYPERV(cmd);
+ hyperv->gen.nd_family = NVDIMM_FAMILY_HYPERV;
+ hyperv->gen.nd_command = ND_HYPERV_CMD_GET_HEALTH_INFO;
+ hyperv->gen.nd_fw_size = 0;
+ hyperv->gen.nd_size_in = offsetof(struct nd_hyperv_smart, status);
+ hyperv->gen.nd_size_out = sizeof(hyperv->u.smart);
+ hyperv->u.smart.status = 0;
+
+ cmd->firmware_status = &hyperv->u.smart.status;
+
+ return cmd;
+}
+
+static int hyperv_smart_valid(struct ndctl_cmd *cmd)
+{
+ if (cmd->type != ND_CMD_CALL ||
+ cmd->size != sizeof(*cmd) + sizeof(struct nd_pkg_hyperv) ||
+ CMD_HYPERV(cmd)->gen.nd_family != NVDIMM_FAMILY_HYPERV ||
+ CMD_HYPERV(cmd)->gen.nd_command != ND_HYPERV_CMD_GET_HEALTH_INFO ||
+ cmd->status != 0 ||
+ CMD_HYPERV_STATUS(cmd) != 0)
+ return cmd->status < 0 ? cmd->status : -EINVAL;
+ return 0;
+}
+
+static int hyperv_cmd_xlat_firmware_status(struct ndctl_cmd *cmd)
+{
+ return CMD_HYPERV_STATUS(cmd) == 0 ? 0 : -EINVAL;
+}
+
+static unsigned int hyperv_cmd_smart_get_flags(struct ndctl_cmd *cmd)
+{
+ int rc;
+
+ rc = hyperv_smart_valid(cmd);
+ if (rc < 0) {
+ errno = -rc;
+ return 0;
+ }
+
+ return ND_SMART_HEALTH_VALID;
+}
+
+static unsigned int hyperv_cmd_smart_get_health(struct ndctl_cmd *cmd)
+{
+ unsigned int health = 0;
+ __u32 num;
+ int rc;
+
+ rc = hyperv_smart_valid(cmd);
+ if (rc < 0) {
+ errno = -rc;
+ return UINT_MAX;
+ }
+
+ num = CMD_HYPERV_SMART_DATA(cmd)->health & 0x3F;
+
+ if (num & (BIT(0) | BIT(1)))
+ health |= ND_SMART_CRITICAL_HEALTH;
+
+ if (num & BIT(2))
+ health |= ND_SMART_FATAL_HEALTH;
+
+ if (num & (BIT(3) | BIT(4) | BIT(5)))
+ health |= ND_SMART_NON_CRITICAL_HEALTH;
+
+ return health;
+}
+
+struct ndctl_dimm_ops * const hyperv_dimm_ops = &(struct ndctl_dimm_ops) {
+ .new_smart = hyperv_dimm_cmd_new_smart,
+ .smart_get_flags = hyperv_cmd_smart_get_flags,
+ .smart_get_health = hyperv_cmd_smart_get_health,
+ .xlat_firmware_status = hyperv_cmd_xlat_firmware_status,
+};
diff --git a/ndctl/lib/hyperv.h b/ndctl/lib/hyperv.h
new file mode 100644
index 0000000..8e55a97
--- /dev/null
+++ b/ndctl/lib/hyperv.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU Lesser General Public License,
+ * version 2.1, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT ANY
+ * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for
+ * more details.
+ */
+#ifndef __NDCTL_HYPERV_H__
+#define __NDCTL_HYPERV_H__
+
+/* See http://www.uefi.org/RFIC_LIST ("Virtual NVDIMM 0x1901") */
+enum {
+ ND_HYPERV_CMD_QUERY = 0,
+
+ /* non-root commands */
+ ND_HYPERV_CMD_GET_HEALTH_INFO = 1,
+};
+
+/*
+ * This is actually Function 1's data,
+ * This is the closest I can find to match the "smart".
+ * Hyper-V _DSM methods don't have a smart function.
+ */
+struct nd_hyperv_smart_data {
+ __u32 health;
+} __attribute__((packed));
+
+struct nd_hyperv_smart {
+ __u32 status;
+ union {
+ __u8 buf[4];
+ struct nd_hyperv_smart_data data[0];
+ };
+} __attribute__((packed));
+
+union nd_hyperv_cmd {
+ __u32 status;
+ struct nd_hyperv_smart smart;
+} __attribute__((packed));
+
+struct nd_pkg_hyperv {
+ struct nd_cmd_pkg gen;
+ union nd_hyperv_cmd u;
+} __attribute__((packed));
+
+#endif /* __NDCTL_HYPERV_H__ */
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index c9e2875..48bdb27 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -1492,6 +1492,8 @@ static void *add_dimm(void *parent, int id, const char *dimm_base)
dimm->ops = hpe1_dimm_ops;
if (dimm->cmd_family == NVDIMM_FAMILY_MSFT)
dimm->ops = msft_dimm_ops;
+ if (dimm->cmd_family == NVDIMM_FAMILY_HYPERV)
+ dimm->ops = hyperv_dimm_ops;
sprintf(path, "%s/nfit/dsm_mask", dimm_base);
if (sysfs_read_attr(ctx, path, buf) == 0)
diff --git a/ndctl/lib/private.h b/ndctl/lib/private.h
index a387b0b..a9d35c5 100644
--- a/ndctl/lib/private.h
+++ b/ndctl/lib/private.h
@@ -31,6 +31,7 @@
#include "intel.h"
#include "hpe1.h"
#include "msft.h"
+#include "hyperv.h"
struct nvdimm_data {
struct ndctl_cmd *cmd_read;
@@ -270,6 +271,7 @@ struct ndctl_cmd {
struct nd_cmd_pkg pkg[0];
struct ndn_pkg_hpe1 hpe1[0];
struct ndn_pkg_msft msft[0];
+ struct nd_pkg_hyperv hyperv[0];
struct nd_pkg_intel intel[0];
struct nd_cmd_get_config_size get_size[0];
struct nd_cmd_get_config_data_hdr get_data[0];
@@ -344,6 +346,7 @@ struct ndctl_dimm_ops {
struct ndctl_dimm_ops * const intel_dimm_ops;
struct ndctl_dimm_ops * const hpe1_dimm_ops;
struct ndctl_dimm_ops * const msft_dimm_ops;
+struct ndctl_dimm_ops * const hyperv_dimm_ops;
static inline struct ndctl_bus *cmd_to_bus(struct ndctl_cmd *cmd)
{
diff --git a/ndctl/ndctl.h b/ndctl/ndctl.h
index c6aaa4c..008f81c 100644
--- a/ndctl/ndctl.h
+++ b/ndctl/ndctl.h
@@ -262,6 +262,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE1 1
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
+#define NVDIMM_FAMILY_HYPERV 4
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.19.1
1 year, 11 months
[ndctl PATCH 0/8] Improve support + testing for labels + info-blocks
by Dan Williams
As noted in the kernel patches for this issue:
Lately Linux has encountered platforms that collide Persistent
Memory regions between each other, specifically cases where
->start_pad needed to be non-zero. This lead to commit ae86cbfef381
"libnvdimm, pfn: Pad pfn namespaces relative to other regions". That
commit allowed namespaces to be mapped with devm_memremap_pages().
However dax operations on those configurations currently fail if
attempted within the ->start_pad range because
pmem_device->data_offset was still relative to raw resource base not
relative to the section aligned resource range mapped by
devm_memremap_pages().
Luckily __bdev_dax_supported() caught these failures and simply
disabled dax. However, to fix this situation a non-backwards
compatible change needs to be made to the interpretation of the
nd_pfn info-block. ->start_pad needs to be accounted in
->map.map_offset (formerly ->data_offset), and ->map.map_base
(formerly ->phys_addr) needs to be adjusted to the section aligned
resource base used to establish ->map.map formerly (formerly
->virt_addr).
Towards preventing similar bugs in this area introduce a regression
test "test/collide.sh" to validate support for pre- and post-fixed
kernels. In the course of developing this test a few missing
capabilities and fixes also surfaced.
---
Dan Williams (8):
ndctl/dimm: Add 'flags' field to read-labels output
ndctl/dimm: Add --human support to read-labels
ndctl/build: Drop -Wpointer-arith
ndctl/namespace: Add read-info-block command
ndctl/test: Update dax-dev to handle multiple e820 ranges
ndctl/test: Make dax.sh more robust vs small namespaces
ndctl/namespace: Always zero info-blocks
ndctl/test: Test inter-region collision handling
configure.ac | 1
ndctl/action.h | 1
ndctl/builtin.h | 1
ndctl/check.c | 20 --
ndctl/dimm.c | 21 ++-
ndctl/namespace.c | 416 +++++++++++++++++++++++++++++++++++++++++++++++++-
ndctl/namespace.h | 51 ++++++
ndctl/ndctl.c | 1
test/Makefile.am | 1
test/collide.sh | 226 +++++++++++++++++++++++++++
test/dax-dev.c | 17 ++
test/dax.sh | 4
test/fsdax-info0.xxd | 11 +
test/fsdax-info1.xxd | 11 +
test/fsdax-info2.xxd | 11 +
test/fsdax-info3.xxd | 11 +
util/fletcher.h | 1
util/size.h | 1
18 files changed, 763 insertions(+), 43 deletions(-)
create mode 100755 test/collide.sh
create mode 100644 test/fsdax-info0.xxd
create mode 100644 test/fsdax-info1.xxd
create mode 100644 test/fsdax-info2.xxd
create mode 100644 test/fsdax-info3.xxd
1 year, 11 months
[mm PATCH v6 0/7] Deferred page init improvements
by Alexander Duyck
This patchset is essentially a refactor of the page initialization logic
that is meant to provide for better code reuse while providing a
significant improvement in deferred page initialization performance.
In my testing on an x86_64 system with 384GB of RAM and 3TB of persistent
memory per node I have seen the following. In the case of regular memory
initialization the deferred init time was decreased from 3.75s to 1.06s on
average. For the persistent memory the initialization time dropped from
24.17s to 19.12s on average. This amounts to a 253% improvement for the
deferred memory initialization performance, and a 26% improvement in the
persistent memory initialization performance.
I have called out the improvement observed with each patch.
Note: This patch set is meant as a replacment for the v5 set that is already
in the MM tree.
I had considered just doing incremental changes but Pavel at the time
had suggested I submit it as a whole set, however that was almost 3
weeks ago so if incremental changes are preferred let me know and
I can submit the changes as incremental updates.
I appologize for the delay in submitting this follow-on set. I had been
trying to address the DAX PageReserved bit issue at the same time but
that is taking more time than I anticipated so I decided to push this
before the code sits too much longer.
Commit bf416078f1d83 ("mm/page_alloc.c: memory hotplug: free pages as
higher order") causes issues with the revert of patch 7. It was
necessary to replace all instances of __free_pages_boot_core with
__free_pages_core.
v1->v2:
Fixed build issue on PowerPC due to page struct size being 56
Added new patch that removed __SetPageReserved call for hotplug
v2->v3:
Rebased on latest linux-next
Removed patch that had removed __SetPageReserved call from init
Added patch that folded __SetPageReserved into set_page_links
Tweaked __init_pageblock to use start_pfn to get section_nr instead of pfn
v3->v4:
Updated patch description and comments for mm_zero_struct_page patch
Replaced "default" with "case 64"
Removed #ifndef mm_zero_struct_page
Fixed typo in comment that ommited "_from" in kerneldoc for iterator
Added Reviewed-by for patches reviewed by Pavel
Added Acked-by from Michal Hocko
Added deferred init times for patches that affect init performance
Swapped patches 5 & 6, pulled some code/comments from 4 into 5
v4->v5:
Updated Acks/Reviewed-by
Rebased on latest linux-next
Split core bits of zone iterator patch from MAX_ORDER_NR_PAGES init
v5->v6:
Rebased on linux-next with previous v5 reverted
Drop the "This patch" or "This change" from patch desriptions.
Cleaned up patch descriptions for patches 3 & 4
Fixed kerneldoc for __next_mem_pfn_range_in_zone
Updated several Reviewed-by, and incorporated suggestions from Pavel
Added __init_single_page_nolru to patch 5 to consolidate code
Refactored iterator in patch 7 and fixed several issues
---
Alexander Duyck (7):
mm: Use mm_zero_struct_page from SPARC on all 64b architectures
mm: Drop meminit_pfn_in_nid as it is redundant
mm: Implement new zone specific memblock iterator
mm: Initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections
mm: Move hot-plug specific memory init into separate functions and optimize
mm: Add reserved flag setting to set_page_links
mm: Use common iterator for deferred_init_pages and deferred_free_pages
arch/sparc/include/asm/pgtable_64.h | 30 --
include/linux/memblock.h | 41 +++
include/linux/mm.h | 50 +++
mm/memblock.c | 64 ++++
mm/page_alloc.c | 571 +++++++++++++++++++++--------------
5 files changed, 498 insertions(+), 258 deletions(-)
--
1 year, 11 months