Thanks for reviewing this patch Ira,
Ira Weiny <ira.weiny(a)intel.com> writes:
On Thu, Sep 10, 2020 at 02:52:12PM +0530, Vaibhav Jain wrote:
> A warning is reported by the kernel in case perf_stats_show() returns
> an error code. The warning is of the form below:
> papr_scm ibm,persistent-memory:ibm,pmemory@44100001:
> Failed to query performance stats, Err:-10
> dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
> fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count
> On investigation it looks like that the compiler is silently truncating the
> return value of drc_pmem_query_stats() from 'long' to 'int', since
> variable used to store the return code 'rc' is an 'int'. This
> truncated value is then returned back as a 'ssize_t' back from
> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
> unsigned number and triggers this warning..
> To fix this we update the type of variable 'rc' from 'int' to
> 'ssize_t' that prevents the compiler from truncating the return value
> of drc_pmem_query_stats() and returning correct signed value back from
> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
> stats from PHYP')
> Signed-off-by: Vaibhav Jain <vaibhav(a)linux.ibm.com>
> arch/powerpc/platforms/pseries/papr_scm.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
> index a88a707a608aa..9f00b61676ab9 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> static ssize_t perf_stats_show(struct device *dev,
> struct device_attribute *attr, char *buf)
> - int index, rc;
> + int index;
> + ssize_t rc;
I'm not sure this is really fixing everything here.
The issue is with the statement in perf_stats_show():
'return rc ? rc : seq_buf_used(&s);'
The function seq_buf_used() returns an 'unsigned int' and with 'rc'
typed as 'int', forces a promotion of the expression to 'unsigned int'
which causes a loss of signedness of 'rc' and compiler silently
assigns this unsigned value to the function return typed as 'signed
Making 'rc', a 'signed long' forces a promotion of the expresion to
'signed long' which preserves the signedness of 'rc' and will also be
compatible with the function return type.
drc_pmem_query_stats() can return negative errno's. Why are those not checked
somewhere in perf_stats_show()?
For the specific invocation 'drc_pmem_query_stats(p, stats, 0)' we only
expect return value 'rc <=0' with '0' indicating a successful fetch of
nvdimm performance stats from hypervisor. Hence there are no explicit
checks for negative error codes in the functions as all return values
!=0 indicate an error.
It seems like all this fix is handling is a > 0 return value:
line 289 in papr_scm.c... Or something?
No, in case the arg 'num_stats' is
'0' and 'buff_stats != NULL' the
variable 'size' is assigned a non-zero value hence that specific branch
you mentioned is never taken. Instead in case of success
drc_pmem_query_stats() return '0' and in case of an error a negative
error code is returned.
Worse yet drc_pmem_query_stats() is returning ssize_t which is a signed value.
Therefore, it should not be returning -errno. I'm surprised the static
checkers did not catch that.
Didnt quite get the assertion here. The function is
marked to return
'ssize_t' because we can return both +ve for success and -ve values to
I believe I caught similar errors with a patch series before which did not pay
attention to variable types.
Please audit this code for these types of errors and ensure you are really
doing the correct thing when using the sysfs interface. I'm pretty sure bad
things will eventually happen (if they are not already) if you return some
really big number to the sysfs core from *_show().
I think this problem is
different compared to what you had previously pointed
to. The values returned from drc_pmem_query_stats() can be stored in an
'int' variable too, however it was the silent promotion of a signed type
to unsigned type was what caused this specific issue.