Arnaldo Carvalho de Melo [Wed, 25 Apr 2018 15:23:17 +0000 (12:23 -0300)]
perf tests vmlinux-kallsyms: Use machine__find_kernel_function(_by_name)
We had this for ages, IIRC for 'perf probe' use initially, so use them
instead of the variants that pass the map_type, that is going away.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-x1jpogsvj822sh0q8leiaoep@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 25 Apr 2018 15:18:11 +0000 (12:18 -0300)]
perf machine: Remove needless map_type from machine__load_vmlinux_path()
Since it uses machine__kernel_map() and this function always returns the
MAP__FUNCTION map, it doesn't make sense to call it with MAP__VARIABLE.
And also this is a step in the direction of nuking the MAP__{FUNCTION,VARIABLE}
split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-0h3eof3kx3kq32ixg5fquf3p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 25 Apr 2018 14:40:32 +0000 (11:40 -0300)]
perf machine: Shorten machine__load_kallsyms() signature
So far the only use is for MAP__FUNCTION, and since we're going to
remove that split, remove the map_type argument in machine__load_kallsyms().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-5dhgh7x8g9hx5hpxlp3k08jp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 24 Apr 2018 20:06:25 +0000 (17:06 -0300)]
perf machine: Introduce machine__kernel_maps()
That returns the a data structure contained the ordered list of kernel
modules + the main kernel maps, one more step in removing the
MAP__{FUNCTION,VARIABLE} split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-qsgbxfyaohc80c9ma049dubm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Takashi Iwai [Tue, 24 Apr 2018 15:04:56 +0000 (17:04 +0200)]
perf Documentation: Support for asciidoctor
The asciidoc package seems behind the recent big wave of python3
conversion, and we were advised to switch to asciidoctor instead. It's
almost compatible but some extensions used for perf documentation don't
work with it. Here is the patch to cover them, and add the proper
support for asciidoctor.
Pass USE_ASCIIDOCTOR=yes to make for using asciidoctor instead of
asciidoc. The man source and manual attributes are passed via command
options. The support for these attributes have been fixed in the
latest asciidoctor code.
Since asciidoctor can covert to a man page and an HTML directly, we
can omit the dependency on xmlto when USE_ASCIIDOCTOR is set.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180424150456.17353-1-tiwai@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 24 Apr 2018 15:16:09 +0000 (12:16 -0300)]
perf map: Shorten map_groups__find_by_name() signature
Another step in the road to elliminate the MAP_{FUNCTION,VARIABLE}
separation, reducing the exposure to these details in the tools using
the symbol APIs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-8a1hvrqe3r5i0kw865u3uxwt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 24 Apr 2018 15:05:48 +0000 (12:05 -0300)]
perf thread: Make thread__find_symbol() return the symbol searched
Instead of just returning it in al.sym, allowing for some simplification
in its users, and to make it consistent with thread__find_map().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4axi2sigslffdixzxbehvgoj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 24 Apr 2018 14:58:56 +0000 (11:58 -0300)]
perf thread: Make thread__find_map() return the map
It was returning the searched map just on the addr_location passed, with
the function itself returning void.
Make it return the map so that we can make the code more compact.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-tzlrrzdeoof4i6ktyqv1t6ks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 24 Apr 2018 14:32:30 +0000 (11:32 -0300)]
perf script: Use thread__find_symbol() instead of ad-hoc equivalent
In
dc323ce8e72d ("perf script: Enable printing of branch stack") it
first tries to find the map for an address, then the symbol in the DSO
backing that map, for that address, well, this is what
thread__find_symbol() does, so just use it and make the code shorter.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-03nx3aod955yqnf9l06im28j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 24 Apr 2018 14:24:49 +0000 (11:24 -0300)]
perf thread: Introduce thread__find_symbol()
Out of thread__find_addr_location(..., MAP__FUNCTION, ...), idea here is to
continue removing references to MAP__{FUNCTION,VARIABLE} ahead of
getting both types of symbols in the same rbtree, as various places do
two lookups, looking first at MAP__FUNCTION, then at MAP__VARIABLE.
So thread__find_symbol() will eventually do just that, and 'struct
symbol' will have the symbol type, for code that cares about that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-n7528en9e08yd3flzmb26tth@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Hendrik Brueckner [Fri, 13 Apr 2018 07:42:23 +0000 (09:42 +0200)]
perf tests: Let 'perf test list' display subtests
The output of perf test and perf test list differ because perf test list
does not display subtests. Correct this behavior and also let perf test
list report subtests.
For example:
$ ./perf test 2>&1 |wc -l
65
Without this commit:
$ ./perf test list 2>&1 |wc -l
57
With this commit:
$ ./perf test list 2>&1 |wc -l
65
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: linux-s390@vger.kernel.org
LPU-Reference:
1523605343-11970-1-git-send-email-brueckner@linux.ibm.com
Link: https://lkml.kernel.org/n/tip-efb74jw7x2xs2bucp5hf4ilu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 24 Apr 2018 13:49:50 +0000 (10:49 -0300)]
perf thread: Introduce thread__find_map()
Out of thread__find_add_map(..., MAP__FUNCTION, ...), idea here is to
continue removing references to MAP__{FUNCTION,VARIABLE} ahead of
getting both types of symbols in the same rbtree, as various places do
two lookups, looking first at MAP__FUNCTION, then at MAP__VARIABLE.
So thread__find_map() will eventually do just that, and 'struct symbol'
will have the symbol type, for code that cares about that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-q27xee34l4izpfau49w103s6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 23 Apr 2018 20:13:49 +0000 (17:13 -0300)]
perf map: Introduce map__has_symbols()
To further simplify checking if symbols are available for a given map
and to reduce the number of users of MAP__{FUNCTION,VARIABLE}.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-iyfoyvbfdti5uehgpjum3qrq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 23 Apr 2018 20:08:02 +0000 (17:08 -0300)]
perf dso: Add dso__has_symbols() method
To replace longer code sequences in various places.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-tlk3klbkfyjrbfjvryyznfju@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 23 Apr 2018 19:43:47 +0000 (16:43 -0300)]
perf symbols: Use __map__is_kernel() instead of ad-hoc equivalent code
Shorter, should be equivalent code, use it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-q90olng8sfkvrnsrwu7xnul6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 23 Apr 2018 19:40:02 +0000 (16:40 -0300)]
perf top: Use __map__is_kernel()
Shorter form to figure out if a given map is the kernel one and also
reduces the number of code accessing MAP__{FUNCTION,VARIABLE}, that
should go away at some point.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-rn8pexelsxpx92ce3elu3wiw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 23 Apr 2018 09:08:22 +0000 (11:08 +0200)]
perf stat: Display length strings of each run for --table option
Adding support to display visual aid 'length strings' to easily spot the
biggest difference in time table.
$ perf stat -r 10 --table perf bench sched pipe
...
Performance counter stats for './perf bench sched pipe' (5 runs):
# Table of individual measurements:
5.189 (-0.293) #
5.189 (-0.294) #
5.186 (-0.296) #
5.663 (+0.181) ##
6.186 (+0.703) ####
# Final result:
5.483 +- 0.198 seconds time elapsed ( +- 3.62% )
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-9-jolsa@kernel.org
[ Updated 'perf stat --table' man page entry ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 23 Apr 2018 09:08:21 +0000 (11:08 +0200)]
perf stat: Add --table option to display time of each run
Add --table option to display time for each run (-r option), like:
$ perf stat --null -r 5 --table perf bench sched pipe
Performance counter stats for './perf bench sched pipe' (5 runs):
# Table of individual measurements:
5.379 (-0.176)
5.243 (-0.311)
5.238 (-0.317)
5.536 (-0.019)
6.377 (+0.823)
# Final result:
5.555 +- 0.213 seconds time elapsed ( +- 3.83% )
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-8-jolsa@kernel.org
[ Document the new option in 'perf stat's man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 23 Apr 2018 09:08:20 +0000 (11:08 +0200)]
perf stat: Display time in precision based on std deviation
Ingo suggested to display elapsed time for multirun workload (perf stat
-e) with precision based on the precision of the standard deviation.
In his own words:
> This output is a slightly bit misleading:
> Performance counter stats for 'make -j128' (10 runs):
> 27.
988995256 seconds time elapsed ( +- 0.39% )
> The 9 significant digits in the result, while only 1 is valid, suggests accuracy
> where none exists.
> It would be better if 'perf stat' would display elapsed time with a precision
> adjusted to stddev, it should display at most 2 more significant digits than
> the stddev inaccuracy.
> I.e. in the above case 0.39% is 0.109, so we only have accuracy for 1 digit, and
> so we should only display 3:
> 27.988 seconds time elapsed ( +- 0.39% )
Plus a suggestion about the output, which is small enough and connected
with the above change that I merged both changes together.
> Small output style nit - I think it would be nice if with --repeat the stddev was
> also displayed in absolute values, besides percentage:
>
> 27.988 +- 0.109 seconds time elapsed ( +- 0.39% )
The output is now:
Performance counter stats for './perf bench sched pipe' (5 runs):
SNIP
13.3667 +- 0.0256 seconds time elapsed ( +- 0.19% )
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 23 Apr 2018 09:08:16 +0000 (11:08 +0200)]
perf check-headers.sh: Add support to check 2 independent files
Add 'check_2' function to check 2 different files, the 'check' function
stays to check files that differs only in the prefix path.
In upcoming changes we need to check header files in locations which
don't follow the prefix logic.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 23 Apr 2018 09:08:15 +0000 (11:08 +0200)]
perf check-headers.sh: Simplify arguments passing
Passing whole string instead of parsing them after. It simplifies
things for the next patches, that adds another function call, which
makes it hard to pass arguments in the correct shape.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Tue, 17 Apr 2018 04:13:46 +0000 (09:43 +0530)]
perf buildid-cache: Support --purge-all option
User can remove files from cache using --remove/--purge options but both
needs list of files as an argument. It's not convenient when you want to
flush out entire cache. Add an option to purge all files from cache.
Ex,
# perf buildid-cache -l
8a86ef73e44067bca52cc3f6cd3e5446c783391c /tmp/a.out
ebe71fdcf4b366518cc154d570a33cd461a51c36 /tmp/a.out.1
# perf buildid-cache -P -v
Removing /tmp/a.out (
8a86ef73e44067bca52cc3f6cd3e5446c783391c): Ok
Removing /tmp/a.out.1 (
ebe71fdcf4b366518cc154d570a33cd461a51c36): Ok
Purged all: Ok
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Sihyeon Jang <uneedsihyeon@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180417041346.5617-4-ravi.bangoria@linux.vnet.ibm.com
[ Initialize 'err' in build_id_cache__purge_all(), to fix build on debian:7, as it can be used uninitialized ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Tue, 17 Apr 2018 04:13:45 +0000 (09:43 +0530)]
perf buildid-cache: Support --list option
'perf buildid-cache' allows to add/remove files into cache but there is
no option to list all cached files. Add --list option to list all
_valid_ cached files.
Ex,
# perf buildid-cache --add /tmp/a.out
# perf buildid-cache -l
8a86ef73e44067bca52cc3f6cd3e5446c783391c /tmp/a.out
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Sihyeon Jang <uneedsihyeon@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180417041346.5617-3-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Thu, 26 Apr 2018 05:28:29 +0000 (07:28 +0200)]
Merge tag 'perf-urgent-for-mingo-4.17-
20180425' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf stat:
- Keep the '/' event modifier separator in fallback, for example when
fallbacking from 'cpu/cpu-cycles/' to user level only, where it should
become 'cpu/cpu-cycles/u' and not 'cpu/cpu-cycles/:u' (Jiri Olsa)
- Fix PMU events parsing rule, improving error reporting for
invalid events (Jiri Olsa)
- Disable write_backward and other event attributes for !group
events in a group, fixing, for instance this group: '{cycles,msr/aperf/}:S'
that has leader sampling (:S) and where just the 'cycles',
the leader event, should have the write_backward attribute
set, in this case it all fails because the PMU where 'msr/aperf/'
lives doesn't accepts write_backward style sampling (Jiri Olsa)
- Only fall back group read for leader (Kan Liang)
- Fix core PMU alias list for x86 platform (Kan Liang)
- Print out hint for mixed PMU group error (Kan Liang)
- Fix duplicate PMU name for interval print (Kan Liang)
Core:
- Set main kernel end address properly when reading kernel and
module maps (Namhyung Kim)
perf mem:
- Fix incorrect entries and add missing man options (Sangwon Hong)
s/390:
- Remove s390 specific strcmp_cpuid_cmp function (Thomas Richter)
- Adapt 'perf test' case record+probe_libc_inet_pton.sh for s390
- Fix s390 undefined record__auxtrace_init() return value in
'perf record' (Thomas Richter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kan Liang [Wed, 25 Apr 2018 18:57:17 +0000 (14:57 -0400)]
perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1
The SMM freeze feature was introduced since PerfMon V2. But the current
code unconditionally enables the feature for all platforms. It can
generate #GP exception, if the related FREEZE_WHILE_SMM bit is set for
the machine with PerfMon V1.
To disable the feature for PerfMon V1, perf needs to
- Remove the freeze_on_smi sysfs entry by moving intel_pmu_attrs to
intel_pmu, which is only applied to PerfMon V2 and later.
- Check the PerfMon version before flipping the SMM bit when starting CPU
Fixes:
6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: eranian@google.com
Cc: acme@redhat.com
Link: https://lkml.kernel.org/r/1524682637-63219-1-git-send-email-kan.liang@linux.intel.com
Kan Liang [Tue, 24 Apr 2018 18:20:14 +0000 (11:20 -0700)]
perf stat: Fix duplicate PMU name for interval print
PMU name is printed repeatedly for interval print, for example:
perf stat --no-merge -e 'unc_m_clockticks' -a -I 1000
# time counts unit events
1.
001053069 243,702,144 unc_m_clockticks [uncore_imc_4]
1.
001053069 244,268,304 unc_m_clockticks [uncore_imc_2]
1.
001053069 244,427,386 unc_m_clockticks [uncore_imc_0]
1.
001053069 244,583,760 unc_m_clockticks [uncore_imc_5]
1.
001053069 244,738,971 unc_m_clockticks [uncore_imc_3]
1.
001053069 244,880,309 unc_m_clockticks [uncore_imc_1]
2.
002024821 240,818,200 unc_m_clockticks [uncore_imc_4] [uncore_imc_4]
2.
002024821 240,767,812 unc_m_clockticks [uncore_imc_2] [uncore_imc_2]
2.
002024821 240,764,215 unc_m_clockticks [uncore_imc_0] [uncore_imc_0]
2.
002024821 240,759,504 unc_m_clockticks [uncore_imc_5] [uncore_imc_5]
2.
002024821 240,755,992 unc_m_clockticks [uncore_imc_3] [uncore_imc_3]
2.
002024821 240,750,403 unc_m_clockticks [uncore_imc_1] [uncore_imc_1]
For each print, the PMU name is unconditionally appended to the
counter->name.
Need to check the counter->name first. If the PMU name is already
appended, do nothing.
Committer notes:
Add and use perf_evsel->uniquified_name bool instead of doing the more
expensive strstr(event->name, pmu->name).
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes:
8c5421c016a4 ("perf pmu: Display pmu name when printing unmerged events in stat")
Link: http://lkml.kernel.org/r/1524594014-79243-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Tue, 24 Apr 2018 18:20:12 +0000 (11:20 -0700)]
perf evsel: Only fall back group read for leader
Perf doesn't support mixed events from different PMUs (except software
event) in a group. The perf stat should output <not counted>/<not
supported> for all events, but it doesn't. For example,
perf stat -e '{cycles,uncore_imc_5/umask=0xF,event=0x4/,instructions}'
<not counted> cycles
<not supported> uncore_imc_5/umask=0xF,event=0x4/
1,024,300 instructions
If perf fails to open an event, it doesn't error out directly. It will
disable some features and retry, until the event is opened or all
features are disabled. The disabled features will not be re-enabled. The
group read is one of these features.
For the example as above, the IMC event and the leader event "cycles"
are from different PMUs. Opening the IMC event must fail. The group read
feature must be disabled for IMC event and the followed event
"instructions". The "instructions" event has the same PMU as the leader
"cycles". It can be opened successfully. Since the group read feature
has been disabled, the "instructions" event will be read as a single
event, which definitely has a value.
The group read fallback is still useful for the case which kernel
doesn't support group read. It is good enough to be handled only by the
leader.
For the fallback request from members, it must be caused by an error.
The fallback only breaks the semantics of group. Limit the group read
fallback only for the leader.
Committer testing:
On a broadwell t450s notebook:
Before:
# perf stat -e '{cycles,unc_cbo_cache_lookup.read_i,instructions}' sleep 1
Performance counter stats for 'sleep 1':
<not counted> cycles
<not supported> unc_cbo_cache_lookup.read_i
818,206 instructions
1.
003170887 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
After:
# perf stat -e '{cycles,unc_cbo_cache_lookup.read_i,instructions}' sleep 1
Performance counter stats for 'sleep 1':
<not counted> cycles
<not supported> unc_cbo_cache_lookup.read_i
<not counted> instructions
1.
001380511 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
#
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes:
82bf311e15d2 ("perf stat: Use group read for event groups")
Link: http://lkml.kernel.org/r/1524594014-79243-3-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Tue, 24 Apr 2018 18:20:11 +0000 (11:20 -0700)]
perf stat: Print out hint for mixed PMU group error
Perf doesn't support mixed events from different PMUs (except software
event) in a group. For this case, only "<not counted>" or "<not
supported>" are printed out. There is no hint which guides users to fix
the issue.
Checking the PMU type of events to determine if they are from the same
PMU. There may be false alarm for the checking. E.g. the core PMU has
different PMU type. But it should not happen often.
The false alarm can also be tolerated, because:
- It only happens on error path.
- It just provides a possible solution for the issue.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1524594014-79243-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Tue, 24 Apr 2018 18:20:10 +0000 (11:20 -0700)]
perf pmu: Fix core PMU alias list for X86 platform
When counting uncore event with alias, core event is mistakenly
involved, for example:
perf stat --no-merge -e "unc_m_cas_count.all" -C0 sleep 1
Performance counter stats for 'CPU(s) 0':
0 unc_m_cas_count.all [uncore_imc_4]
0 unc_m_cas_count.all [uncore_imc_2]
0 unc_m_cas_count.all [uncore_imc_0]
153,640 unc_m_cas_count.all [cpu]
0 unc_m_cas_count.all [uncore_imc_5]
25,026 unc_m_cas_count.all [uncore_imc_3]
0 unc_m_cas_count.all [uncore_imc_1]
1.
001447890 seconds time elapsed
The reason is that current implementation doesn't check PMU name of a
event when adding its alias into the alias list for core PMU. The
uncore event aliases are mistakenly added.
This bug was introduced in:
commit
14b22ae028de ("perf pmu: Add helper function is_pmu_core to
detect PMU CORE devices")
Checking the PMU name for all PMUs on X86 and other architectures except
ARM.
There is no behavior change for ARM.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes:
14b22ae028de ("perf pmu: Add helper function is_pmu_core to detect PMU CORE devices")
Link: http://lkml.kernel.org/r/1524594014-79243-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Mon, 23 Apr 2018 14:29:40 +0000 (16:29 +0200)]
perf record: Fix s390 undefined record__auxtrace_init() return value
Command 'perf record' calls:
cmd_report()
record__auxtrace_init()
auxtrace_record__init()
On s390 function auxtrace_record__init() returns random return value due
to missing initialization.
This sometime causes 'perf record' to exit immediately without error
message and creating a perf.data file.
Fix this by setting error the return code to zero before returning from
platform specific functions which may not set the error code in call
cases.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180423142940.21143-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sangwon Hong [Sun, 22 Apr 2018 07:29:06 +0000 (16:29 +0900)]
perf mem: Document incorrect and missing options
Several options were incorrectly described, some lacked describing
required arguments while others were simply not documented, fix it.
Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1524382146-19609-1-git-send-email-qpakzk@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 23 Apr 2018 09:08:19 +0000 (11:08 +0200)]
perf evsel: Disable write_backward for leader sampling group events
.. and other related fields that do not need to be enabled
for events that have sampling leader.
It fixes the perf top usage Ingo reported broken:
# perf top -e '{cycles,msr/aperf/}:S'
The 'msr/aperf/' event is configured for write_back sampling, which is
not allowed by the MSR PMU, so it fails to create the event.
Adjusting related attr test.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 23 Apr 2018 09:08:18 +0000 (11:08 +0200)]
perf pmu: Fix pmu events parsing rule
Currently all the event parsing fails end up in the event_pmu rule, and
display misleading help like:
$ perf stat -e inst kill
event syntax error: 'inst'
\___ Cannot find PMU `inst'. Missing kernel support?
...
The reason is that the event_pmu is too strong and match also single
string. Changing it to force the '/' separators to be part of the rule,
and getting the proper error now:
$ perf stat -e inst kill
event syntax error: 'inst'
\___ parser error
Run 'perf list' for a list of valid events
...
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 23 Apr 2018 09:08:17 +0000 (11:08 +0200)]
perf stat: Keep the / modifier separator in fallback
The 'perf stat' fallback for EACCES error sets the exclude_kernel
perf_event_attr and tries perf_event_open() again with it. In addition,
it also changes the name of the event to reflect that change by adding
the 'u' modifier.
But it does not take into account the '/' separator, so the event name
can end up mangled, like: (note the '/:' characters)
$ perf stat -e cpu/cpu-cycles/ kill
...
386,832 cpu/cpu-cycles/:u
Adding the code to check on the '/' separator and set the following
correct event name:
$ perf stat -e cpu/cpu-cycles/ kill
...
388,548 cpu/cpu-cycles/u
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Mon, 23 Apr 2018 08:24:28 +0000 (10:24 +0200)]
perf test: Adapt test case record+probe_libc_inet_pton.sh for s390
perf test case 58 (record+probe_libc_inet_pton.sh) executed on s390x
using kernel 4.16.0rc3 displays this result:
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
probe_libc:inet_pton: (
3ffa0240448)
__GI___inet_pton (/usr/lib64/libc-2.26.so)
gaih_inet (inlined)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/ping)
After I installed kernel 4.16.0 the same tests uses commands:
# perf record -e probe_libc:inet_pton/call-graph=dwarf/
-o /tmp/perf.data.abc ping -6 -c 1 ::1
# perf script -i /tmp/perf.data.abc
and displays:
ping 39048 [006] 84230.381198: probe_libc:inet_pton: (
3ffa0240448)
140448 __GI___inet_pton (/usr/lib64/libc-2.26.so)
fbde1 gaih_inet (inlined)
fe2b9 __GI_getaddrinfo (inlined)
398d main (/usr/bin/ping)
Nothing else changed including glibc elfutils and other libraries picked
up by the build.
The entries for __libc_start_main and _start are missing.
I bisected missing __libc_start_main and _start to commit
Fixes:
3d20c6246690 ("perf unwind: Unwind with libdw doesn't take symfs into account")
When I undo this commit I get this call stack on s390:
[root@s35lp76 perf]# ./perf script -i /tmp/perf.data.abc
ping 39048 [006] 84230.381198: probe_libc:inet_pton: (
3ffa0240448)
140448 __GI___inet_pton (/usr/lib64/libc-2.26.so)
fbde1 gaih_inet (inlined)
fe2b9 __GI_getaddrinfo (inlined)
398d main (/usr/bin/ping)
22fbd __libc_start_main (/usr/lib64/libc-2.26.so)
457b _start (/usr/bin/ping)
Looks like dwarf functions dwfl_xxx create different call back stack
trace when using file /usr/lib/debug/usr/bin/ping-
20161105-7.fc27.s390x.debug
instead of file /usr/bin/ping.
Fix this test case on s390 and do not expect any call back stack entry
after the main() function. Also be more robust and accept a leading
__GI_ prefix in front of getaddrinfo.
On x86 this test case shows the same call stack using both kernel
versions 4.16.0rc3 and 4.16.0 and also stops at main:
[root@f27 perf]# ./perf script -i /tmp/perf.data.tmr
ping 4446 [000] 172.027088: probe_libc:inet_pton: (
7fdfa08c93c0)
1393c0 __GI___inet_pton (/usr/lib64/libc-2.26.so)
fe60d getaddrinfo (/usr/lib64/libc-2.26.so)
2f40 main (/usr/bin/ping)
[root@f27 perf]#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Martin Vuille <jpmv27@aim.com>
Link: http://lkml.kernel.org/r/20180423082428.7930-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Mon, 23 Apr 2018 08:17:45 +0000 (10:17 +0200)]
perf list: Remove s390 specific strcmp_cpuid_cmp function
Make the type field in pmu-events/arch/s390/mapfile.cvs more generic to
match the created cpuid string for s390.
The pattern also checks for the counter first version number and counter
second version number ([13]\.[1-5]) and the authorization field which
follows.
These numbers do not exist in the cpuid identification string when perf
commands are executed on a z/VM environment (which does not support CPU
counter measurement facility).
CPUID string for LPAR:
cpuid : IBM,3906,704,M03,3.5,002f
CPUID string for z/VM:
cpuid : IBM,2964,702,N96
This allows the removal of s390 specific cpuid compare code and uses the
common compare function with its regular expression matching algorithm.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180423081745.3672-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 19 Feb 2018 10:05:45 +0000 (19:05 +0900)]
perf machine: Set main kernel end address properly
map_groups__fixup_end() was called to set the end addresses of kernel
and module maps. But now since machine__create_modules() sets the end
address of modules properly, the only remaining piece is the kernel map.
We can set it with adjacent module's address directly instead of calling
map_groups__fixup_end(). If there's no module after the kernel map, the
end address will be ~0ULL.
Since it also changes the start address of the kernel map, it needs to
re-insert the map to the kmaps in order to keep a correct ordering. Kim
reported that it caused problems on ARM64.
Reported-by: Kim Phillips <kim.phillips@arm.com>
Tested-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20180419235915.GA19067@sejong
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Mon, 23 Apr 2018 02:20:09 +0000 (19:20 -0700)]
Linux 4.17-rc2
Linus Torvalds [Mon, 23 Apr 2018 00:14:29 +0000 (17:14 -0700)]
Merge tag 'drm-fixes-for-v4.17-rc2' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Exynos, i915, vc4, amdgpu fixes.
i915:
- an oops fix
- two race fixes
- some gvt fixes
amdgpu:
- dark screen fix
- clk/voltage fix
- vega12 smu fix
vc4:
- memory leak fix
exynos just drops some code"
* tag 'drm-fixes-for-v4.17-rc2' of git://people.freedesktop.org/~airlied/linux: (23 commits)
drm/amd/powerplay: header file interface to SMU update
drm/amd/pp: Fix bug voltage can't be OD separately on VI
drm/amd/display: Don't program bypass on linear regamma LUT
drm/i915: Fix LSPCON TMDS output buffer enabling from low-power state
drm/i915/audio: Fix audio detection issue on GLK
drm/i915: Call i915_perf_fini() on init_hw error unwind
drm/i915/bios: filter out invalid DDC pins from VBT child devices
drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6
drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value
drm/exynos: exynos_drm_fb -> drm_framebuffer
drm/exynos: Move dma_addr out of exynos_drm_fb
drm/exynos: Move GEM BOs to drm_framebuffer
drm: Fix HDCP downstream dev count read
drm/vc4: Fix memory leak during BO teardown
drm/i915/execlists: Clear user-active flag on preemption completion
drm/i915/gvt: Add drm_format_mod update
drm/i915/gvt: Disable primary/sprite/cursor plane at virtual display initialization
drm/i915/gvt: Delete redundant error message in fb_decode.c
drm/i915/gvt: Cancel dma map when resetting ggtt entries
drm/i915/gvt: Missed to cancel dma map for ggtt entries
...
Dave Airlie [Sun, 22 Apr 2018 22:54:06 +0000 (08:54 +1000)]
Merge branch 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux into drm-next
- Fix a dark screen issue in DC
- Fix clk/voltage dependency tracking for wattman
- Update SMU interface for vega12
* 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/powerplay: header file interface to SMU update
drm/amd/pp: Fix bug voltage can't be OD separately on VI
drm/amd/display: Don't program bypass on linear regamma LUT
Dave Airlie [Sun, 22 Apr 2018 22:53:41 +0000 (08:53 +1000)]
Merge tag 'exynos-drm-fixes-for-v4.17-rc2' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-next
Remove Exynos specific framebuffer structure and
relevant functions.
- it removes exynos_drm_fb structure which is a wrapper of
drm_framebuffer and unnecessary two exynos specific callback
functions, exynos_drm_destory() and exynos_drm_fb_create_handle()
because we can reuse existing drm common callback ones instead.
* tag 'exynos-drm-fixes-for-v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: exynos_drm_fb -> drm_framebuffer
drm/exynos: Move dma_addr out of exynos_drm_fb
drm/exynos: Move GEM BOs to drm_framebuffer
drm/amdkfd: Deallocate SDMA queues correctly
drm/amdkfd: Fix scratch memory with HWS enabled
Dave Airlie [Sun, 22 Apr 2018 22:53:27 +0000 (08:53 +1000)]
Merge tag 'drm-intel-next-fixes-2018-04-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Fix for FDO #105549: Avoid OOPS on bad VBT (Jani)
- Fix rare pre-emption race (Chris)
- Fix RC6 race against PM transitions (Tvrtko)
* tag 'drm-intel-next-fixes-2018-04-19' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915/audio: Fix audio detection issue on GLK
drm/i915: Call i915_perf_fini() on init_hw error unwind
drm/i915/bios: filter out invalid DDC pins from VBT child devices
drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6
drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value
drm/i915/execlists: Clear user-active flag on preemption completion
drm/i915/gvt: Add drm_format_mod update
drm/i915/gvt: Disable primary/sprite/cursor plane at virtual display initialization
drm/i915/gvt: Delete redundant error message in fb_decode.c
drm/i915/gvt: Cancel dma map when resetting ggtt entries
drm/i915/gvt: Missed to cancel dma map for ggtt entries
drm/i915/gvt: Make MI_USER_INTERRUPT nop in cmd parser
drm/i915/gvt: Mark expected switch fall-through in handle_g2v_notification
drm/i915/gvt: throw error on unhandled vfio ioctls
Dave Airlie [Sun, 22 Apr 2018 22:52:54 +0000 (08:52 +1000)]
Merge tag 'drm-misc-fixes-2018-04-18-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-fixes:
stable: vc4: Fix memory leak during BO teardown (Daniel)
dp: Add i2c retry for LSPCON adapters (Imre)
hdcp: Fix device count mask (Ramalingam)
Cc: Daniel J Blueman <daniel@quora.org
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
* tag 'drm-misc-fixes-2018-04-18-1' of git://anongit.freedesktop.org/drm/drm-misc:
drm/i915: Fix LSPCON TMDS output buffer enabling from low-power state
drm: Fix HDCP downstream dev count read
drm/vc4: Fix memory leak during BO teardown
Linus Torvalds [Sun, 22 Apr 2018 19:13:04 +0000 (12:13 -0700)]
Merge tag '4.17-rc1-SMB3-CIFS' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Various SMB3/CIFS fixes.
There are three more security related fixes in progress that are not
included in this set but they are still being tested and reviewed, so
sending this unrelated set of smaller fixes now"
* tag '4.17-rc1-SMB3-CIFS' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: fix typo in cifs_dbg
cifs: do not allow creating sockets except with SMB1 posix exensions
cifs: smbd: Dump SMB packet when configured
cifs: smbd: Check for iov length on sending the last iov
fs: cifs: Adding new return type vm_fault_t
cifs: smb2ops: Fix NULL check in smb2_query_symlink
Linus Torvalds [Sun, 22 Apr 2018 19:09:27 +0000 (12:09 -0700)]
Merge tag 'for-4.17-rc1-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"This contains a few fixups to the qgroup patches that were merged this
dev cycle, unaligned access fix, blockgroup removal corner case fix
and a small debugging output tweak"
* tag 'for-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: print-tree: debugging output enhancement
btrfs: Fix race condition between delayed refs and blockgroup removal
btrfs: fix unaligned access in readdir
btrfs: Fix wrong btrfs_delalloc_release_extents parameter
btrfs: delayed-inode: Remove wrong qgroup meta reservation calls
btrfs: qgroup: Use independent and accurate per inode qgroup rsv
btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT
Linus Torvalds [Sun, 22 Apr 2018 18:40:52 +0000 (11:40 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:
- Prevent X2APIC ID 0xFFFFFFFF from being treated as valid, which
causes the possible CPU count to be wrong.
- Prevent 32bit truncation in calc_hpet_ref() which causes the TSC
calibration to fail
- Fix the page table setup for temporary text mappings in the resume
code which causes resume failures
- Make the page table dump code handle HIGHPTE correctly instead of
oopsing
- Support for topologies where NUMA nodes share an LLC to prevent a
invalid topology warning and further malfunction on such systems.
- Remove the now unused pci-nommu code
- Remove stale function declarations"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/power/64: Fix page-table setup for temporary text mapping
x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y
x86,sched: Allow topologies where NUMA nodes share an LLC
x86/processor: Remove two unused function declarations
x86/acpi: Prevent X2APIC id 0xffffffff from being accounted
x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
x86: Remove pci-nommu.c
Linus Torvalds [Sun, 22 Apr 2018 17:49:02 +0000 (10:49 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A small set of timer fixes:
- Evaluate the -ETIME condition correctly in the imx tpm driver
- Fix the evaluation order of a condition in posix cpu timers
- Use pr_cont() in the clockevents code to prevent ugly message
splitting
- Remove __current_kernel_time() which is now unused to prevent that
new users show up.
- Remove a stale forward declaration"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/imx-tpm: Correct -ETIME return condition check
posix-cpu-timers: Ensure set_process_cpu_timer is always evaluated
timekeeping: Remove __current_kernel_time()
timers: Remove stale struct tvec_base forward declaration
clockevents: Fix kernel messages split across multiple lines
Linus Torvalds [Sun, 22 Apr 2018 17:17:01 +0000 (10:17 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A larger set of updates for perf.
Kernel:
- Handle the SBOX uncore monitoring correctly on Broadwell CPUs which
do not have SBOX.
- Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The
percentage of preempting and non-preempting context switches help
understanding the nature of workloads (CPU or IO bound) that are
running on a machine. This adds the kernel facility and userspace
changes needed to show this information in 'perf script' and 'perf
report -D' (Alexey Budankov)
- Remove a WARN_ON() in the trace/kprobes code which is pointless
because the return error code is already telling the caller what's
wrong.
- Revert a fugly workaround for clang BPF targets.
- Fix sample_max_stack maximum check and do not proceed when an error
has been detect, return them to avoid misidentifying errors (Jiri
Olsa)
- Add SPDX idenitifiers and get rid of GPL boilderplate.
Tools:
- Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar)
- Support MAP_FIXED_NOREPLACE, noticed when updating the
tools/include/ copies (Arnaldo Carvalho de Melo)
- Add '\n' at the end of parse-options error messages (Ravi Bangoria)
- Add s390 support for detailed/verbose PMU event description (Thomas
Richter)
- perf annotate fixes and improvements:
* Allow showing offsets in more than just jump targets, use the
new 'O' hotkey in the TUI, config ~/.perfconfig
annotate.offset_level for it and for --stdio2 (Arnaldo Carvalho
de Melo)
* Use the resolved variable names from objdump disassembled lines
to make them more compact, just like was already done for some
instructions, like "mov", this eventually will be done more
generally, but lets now add some more to the existing mechanism
(Arnaldo Carvalho de Melo)
- perf record fixes:
* Change warning for missing topology sysfs entry to debug, as not
all architectures have those files, s390 being one of those
(Thomas Richter)
* Remove old error messages about things that unlikely to be the
root cause in modern systems (Andi Kleen)
- perf sched fixes:
* Fix -g/--call-graph documentation (Takuya Yamamoto)
- perf stat:
* Enable 1ms interval for printing event counters values in
(Alexey Budankov)
- perf test fixes:
* Run dwarf unwind on arm32 (Kim Phillips)
* Remove unused ptrace.h include from LLVM test, sidesteping older
clang's lack of support for some asm constructs (Arnaldo
Carvalho de Melo)
* Fixup BPF test using epoll_pwait syscall function probe, to cope
with the syscall routines renames performed in this development
cycle (Arnaldo Carvalho de Melo)
- perf version fixes:
* Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version
--build-options' when HAVE_SYSCALL_TABLE_SUPPORT is true, as
libaudit won't be used in that case, print info about
syscall_table support instead (Jin Yao)
- Build system fixes:
* Use HAVE_..._SUPPORT used consistently (Jin Yao)
* Restore READ_ONCE() C++ compatibility in tools/include (Mark
Rutland)
* Give hints about package names needed to build jvmti (Arnaldo
Carvalho de Melo)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs
perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"
coresight: Move to SPDX identifier
perf test BPF: Fixup BPF test using epoll_pwait syscall function probe
perf tests mmap: Show which tracepoint is failing
perf tools: Add '\n' at the end of parse-options error messages
perf record: Remove suggestion to enable APIC
perf record: Remove misleading error suggestion
perf hists browser: Clarify top/report browser help
perf mem: Allow all record/report options
perf trace: Support MAP_FIXED_NOREPLACE
perf: Remove superfluous allocation error check
perf: Fix sample_max_stack maximum check
perf: Return proper values for user stack errors
perf list: Add s390 support for detailed/verbose PMU event description
perf script: Extend misc field decoding with switch out event type
perf report: Extend raw dump (-D) out with switch out event type
perf/core: Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]
tools/headers: Synchronize kernel ABI headers, v4.17-rc1
trace_kprobe: Remove warning message "Could not insert probe at..."
...
Linus Torvalds [Sun, 22 Apr 2018 16:48:13 +0000 (09:48 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Thomas Gleixner:
"A single fix for objtool so it uses the host C and LD flags and not
the target ones"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Support HOSTCFLAGS and HOSTLDFLAGS
Linus Torvalds [Sun, 22 Apr 2018 04:20:48 +0000 (21:20 -0700)]
Merge tag 'random_for_linus_stable' of git://git./linux/kernel/git/tytso/random
Pull /dev/random fixes from Ted Ts'o:
"Fix some bugs in the /dev/random driver which causes getrandom(2) to
unblock earlier than designed.
Thanks to Jann Horn from Google's Project Zero for pointing this out
to me"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: add new ioctl RNDRESEEDCRNG
random: crng_reseed() should lock the crng instance that it is modifying
random: set up the NUMA crng instances after the CRNG is fully initialized
random: use a different mixing algorithm for add_device_randomness()
random: fix crng_ready() test
Linus Torvalds [Sun, 22 Apr 2018 04:11:05 +0000 (21:11 -0700)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A regression fix, new unit test infrastructure and a build fix:
- Regression fix addressing support for the new NVDIMM label storage
area access commands (_LSI, _LSR, and _LSW).
The Intel specific version of these commands communicated the
"Device Locked" status on the label-storage-information command.
However, these new commands (standardized in ACPI 6.2) communicate
the "Device Locked" status on the label-storage-read command, and
the driver was missing the indication.
Reading from locked persistent memory is similar to reading
unmapped PCI memory space, returns all 1's.
- Unit test infrastructure is added to regression test the "Device
Locked" detection failure.
- A build fix is included to allow the "of_pmem" driver to be built
as a module and translate an Open Firmware described device to its
local numa node"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
MAINTAINERS: Add backup maintainers for libnvdimm and DAX
device-dax: allow MAP_SYNC to succeed
Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error"
libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()
tools/testing/nvdimm: enable labels for nfit_test.1 dimms
tools/testing/nvdimm: fix missing newline in nfit_test_dimm 'handle' attribute
tools/testing/nvdimm: support nfit_test_dimm attributes under nfit_test.1
tools/testing/nvdimm: allow custom error code injection
libnvdimm, dimm: handle EACCES failures from label reads
Linus Torvalds [Sat, 21 Apr 2018 17:32:16 +0000 (10:32 -0700)]
Merge tag 'sound-4.17-rc2' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A few small fixes:
- a fix for the NULL-dereference in rawmidi compat ioctls, triggered
by fuzzer
- HD-audio Realtek codec quirks, a VIA controller fixup
- a long-standing bug fix in LINE6 MIDI"
* tag 'sound-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: rawmidi: Fix missing input substream checks in compat ioctls
ALSA: hda/realtek - adjust the location of one mic
ALSA: hda/realtek - set PINCFG_HEADSET_MIC to parse_flags
ALSA: hda - New VIA controller suppor no-snoop path
ALSA: line6: Use correct endpoint type for midi output
Linus Torvalds [Sat, 21 Apr 2018 17:28:15 +0000 (10:28 -0700)]
Merge tag 'linux-watchdog-4.17-rc2' of git://linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- fall-through fixes
- MAINTAINER change for hpwdt
- renesas-wdt: Add support for WDIOF_CARDRESET
- aspeed: set bootstatus during probe
* tag 'linux-watchdog-4.17-rc2' of git://www.linux-watchdog.org/linux-watchdog:
aspeed: watchdog: Set bootstatus during probe
watchdog: renesas-wdt: Add support for WDIOF_CARDRESET
watchdog: wafer5823wdt: Mark expected switch fall-through
watchdog: w83977f_wdt: Mark expected switch fall-through
watchdog: sch311x_wdt: Mark expected switch fall-through
watchdog: hpwdt: change maintainer.
Linus Torvalds [Sat, 21 Apr 2018 17:26:00 +0000 (10:26 -0700)]
Merge tag 'linux-kselftest-4.17-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
"A fix from Michael Ellerman to not run dnotify_test by default to
prevent Kselftest running forever"
* tag 'linux-kselftest-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/filesystems: Don't run dnotify_test by default
Linus Torvalds [Sat, 21 Apr 2018 17:20:50 +0000 (10:20 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- kasan: avoid pfn_to_nid() before the page array is initialised
- Fix typo causing the "upgrade" of known signals to SIGKILL
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: signal: don't force known signals to SIGKILL
arm64: kasan: avoid pfn_to_nid() before page array is initialized
Linus Torvalds [Sat, 21 Apr 2018 15:15:16 +0000 (08:15 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
- "fork: unconditionally clear stack on fork" is a non-bugfix which got
lost during the merge window - performance concerns appear to have
been adequately addressed.
- and a bunch of fixes
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create()
fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error
kexec_file: do not add extra alignment to efi memmap
proc: fix /proc/loadavg regression
proc: revalidate kernel thread inodes to root:root
autofs: mount point create should honour passed in mode
MAINTAINERS: add personal addresses for Sascha and Uwe
kasan: add no_sanitize attribute for clang builds
rapidio: fix rio_dma_transfer error handling
mm: enable thp migration for shmem thp
writeback: safer lock nesting
mm, pagemap: fix swap offset value for PMD migration entry
mm: fix do_pages_move status handling
fork: unconditionally clear stack on fork
Ingo Molnar [Sat, 21 Apr 2018 07:38:33 +0000 (09:38 +0200)]
Merge tag 'perf-urgent-for-mingo-4.17-
20180420' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:
- Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE].
The percentage of preempting and non-preempting context switches help
understanding the nature of workloads (CPU or IO bound) that are running
on a machine. This adds the kernel facility and userspace changes needed
to show this information in 'perf script' and 'perf report -D' (Alexey Budankov)
- Remove old error messages about things that unlikely to be the root cause
in modern systems (Andi Kleen)
- Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar)
- Support MAP_FIXED_NOREPLACE, noticed when updating the tools/include/
copies (Arnaldo Carvalho de Melo)
- Fixup BPF test using epoll_pwait syscall function probe, to cope with
the syscall routines renames performed in this development cycle (Arnaldo Carvalho de Melo)
- Fix sample_max_stack maximum check and do not proceed when an error
has been detect, return them to avoid misidentifying errors (Jiri Olsa)
- Add '\n' at the end of parse-options error messages (Ravi Bangoria)
- Add s390 support for detailed/verbose PMU event description (Thomas Richter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Matthew Wilcox [Fri, 20 Apr 2018 21:56:20 +0000 (14:56 -0700)]
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
f2fs specifies the __GFP_ZERO flag for allocating some of its pages.
Unfortunately, the page cache also uses the mapping's GFP flags for
allocating radix tree nodes. It always masked off the __GFP_HIGHMEM
flag, and masks off __GFP_ZERO in some paths, but not all. That causes
radix tree nodes to be allocated with a NULL list_head, which causes
backtraces like:
__list_del_entry+0x30/0xd0
list_lru_del+0xac/0x1ac
page_cache_tree_insert+0xd8/0x110
The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through
if they are ever used. Fix them all by using GFP_RECLAIM_MASK at the
innermost location, and remove it from earlier in the callchain.
Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org
Fixes:
449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reported-by: Chris Fries <cfries@google.com>
Debugged-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Fri, 20 Apr 2018 21:56:17 +0000 (14:56 -0700)]
mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create()
If there is heavy memory pressure, page allocation with __GFP_NOWAIT
fails easily although it's order-0 request. I got below warning 9 times
for normal boot.
<snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
.. snip ..
Call trace:
dump_backtrace+0x0/0x4
dump_stack+0xa4/0xc0
warn_alloc+0xd4/0x15c
__alloc_pages_nodemask+0xf88/0x10fc
alloc_slab_page+0x40/0x18c
new_slab+0x2b8/0x2e0
___slab_alloc+0x25c/0x464
__kmalloc+0x394/0x498
memcg_kmem_get_cache+0x114/0x2b8
kmem_cache_alloc+0x98/0x3e8
mmap_region+0x3bc/0x8c0
do_mmap+0x40c/0x43c
vm_mmap_pgoff+0x15c/0x1e4
sys_mmap+0xb0/0xc8
el0_svc_naked+0x24/0x28
Mem-Info:
active_anon:17124 inactive_anon:193 isolated_anon:0
active_file:7898 inactive_file:712955 isolated_file:55
unevictable:0 dirty:27 writeback:18 unstable:0
slab_reclaimable:12250 slab_unreclaimable:23334
mapped:19310 shmem:212 pagetables:816 bounce:0
free:36561 free_pcp:1205 free_cma:35615
Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB
lowmem_reserve[]: 0 1842 1842
Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3872kB
721350 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
945512 pages RAM
0 pages HighMem/MovableOnly
63408 pages reserved
51200 pages cma reserved
__memcg_schedule_kmem_cache_create() tries to create a shadow slab cache
and the worker allocation failure is not really critical because we will
retry on the next kmem charge. We might miss some charges but that
shouldn't be critical. The excessive allocation failure report is not
very helpful.
[mhocko@kernel.org: changelog update]
Link: http://lkml.kernel.org/r/20180418022912.248417-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Fri, 20 Apr 2018 21:56:13 +0000 (14:56 -0700)]
fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error
Commit
4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") is
printing spurious messages under memory pressure due to map_addr == -ENOMEM.
9794 (a.out): Uhuuh, elf segment at
00007f2e34738000(
fffffffffffffff4) requested but the memory is mapped already
14104 (a.out): Uhuuh, elf segment at
00007f34fd76c000(
fffffffffffffff4) requested but the memory is mapped already
16843 (a.out): Uhuuh, elf segment at
00007f930ecc7000(
fffffffffffffff4) requested but the memory is mapped already
Complain only if -EEXIST, and use %px for printing the address.
Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
Fixes:
4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Fri, 20 Apr 2018 21:56:10 +0000 (14:56 -0700)]
kexec_file: do not add extra alignment to efi memmap
Chun-Yi reported a kernel warning message below:
WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 early_iounmap+0x4f/0x12c()
early_iounmap(
ffffffffff200180,
00000118) [0] size not consistent
00000120
The problem is x86 kexec_file_load adds extra alignment to the efi
memmap: in bzImage64_load():
efi_map_sz = efi_get_runtime_map_size();
efi_map_sz = ALIGN(efi_map_sz, 16);
And __efi_memmap_init maps with the size including the alignment bytes
but efi_memmap_unmap use nr_maps * desc_size which does not include the
extra bytes.
The alignment in kexec code is only needed for the kexec buffer internal
use Actually kexec should pass exact size of the efi memmap to 2nd
kernel.
Link: http://lkml.kernel.org/r/20180417083600.GA1972@dhcp-128-65.nay.redhat.com
Signed-off-by: Dave Young <dyoung@redhat.com>
Reported-by: joeyli <jlee@suse.com>
Tested-by: Randy Wright <rwright@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 20 Apr 2018 21:56:06 +0000 (14:56 -0700)]
proc: fix /proc/loadavg regression
Commit
95846ecf9dac ("pid: replace pid bitmap implementation with IDR
API") changed last field of /proc/loadavg (last pid allocated) to be off
by one:
# unshare -p -f --mount-proc cat /proc/loadavg
0.00 0.00 0.00 1/60 2 <===
It should be 1 after first fork into pid namespace.
This is formally a regression but given how useless this field is I
don't think anyone is affected.
Bug was found by /proc testsuite!
Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2
Fixes:
95846ecf9dac508 ("pid: replace pid bitmap implementation with IDR API")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 20 Apr 2018 21:56:03 +0000 (14:56 -0700)]
proc: revalidate kernel thread inodes to root:root
task_dump_owner() has the following code:
mm = task->mm;
if (mm) {
if (get_dumpable(mm) != SUID_DUMP_USER) {
uid = ...
}
}
Check for ->mm is buggy -- kernel thread might be borrowing mm
and inode will go to some random uid:gid pair.
Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 20 Apr 2018 21:55:59 +0000 (14:55 -0700)]
autofs: mount point create should honour passed in mode
The autofs file system mkdir inode operation blindly sets the created
directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
cause selinux dac_override denials.
But the function also checks if the caller is the daemon (as no-one else
should be able to do anything here) so there's no point in not honouring
the passed in mode, allowing the daemon to set appropriate mode when
required.
Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Fri, 20 Apr 2018 21:55:56 +0000 (14:55 -0700)]
MAINTAINERS: add personal addresses for Sascha and Uwe
The idea behind using kernel@pengutronix.de (i.e. the mail alias for the
kernel people at Pengutronix) as email address was to have a backup when
a given developer is on vacation or run over by a bus. Make this more
explicit by adding the alias as reviewer and use the personal address
for Sascha and me.
Link: http://lkml.kernel.org/r/20180413083312.11213-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Konovalov [Fri, 20 Apr 2018 21:55:52 +0000 (14:55 -0700)]
kasan: add no_sanitize attribute for clang builds
KASAN uses the __no_sanitize_address macro to disable instrumentation of
particular functions. Right now it's defined only for GCC build, which
causes false positives when clang is used.
This patch adds a definition for clang.
Note, that clang's revision 329612 or higher is required.
[andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Paul Lawrence <paullawrence@google.com>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ioan Nicu [Fri, 20 Apr 2018 21:55:49 +0000 (14:55 -0700)]
rapidio: fix rio_dma_transfer error handling
Some of the mport_dma_req structure members were initialized late
inside the do_dma_request() function, just before submitting the
request to the dma engine. But we have some error branches before
that. In case of such an error, the code would return on the error
path and trigger the calling of dma_req_free() with a req structure
which is not completely initialized. This causes a NULL pointer
dereference in dma_req_free().
This patch fixes these error branches by making sure that all
necessary mport_dma_req structure members are initialized in
rio_dma_transfer() immediately after the request structure gets
allocated.
Link: http://lkml.kernel.org/r/20180412150605.GA31409@nokia.com
Fixes:
bbd876adb8c72 ("rapidio: use a reference count for struct mport_dma_req")
Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Frank Kunz <frank.kunz@nokia.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 20 Apr 2018 21:55:45 +0000 (14:55 -0700)]
mm: enable thp migration for shmem thp
My testing for the latest kernel supporting thp migration showed an
infinite loop in offlining the memory block that is filled with shmem
thps. We can get out of the loop with a signal, but kernel should return
with failure in this case.
What happens in the loop is that scan_movable_pages() repeats returning
the same pfn without any progress. That's because page migration always
fails for shmem thps.
In memory offline code, memory blocks containing unmovable pages should be
prevented from being offline targets by has_unmovable_pages() inside
start_isolate_page_range(). So it's possible to change migratability for
non-anonymous thps to avoid the issue, but it introduces more complex and
thp-specific handling in migration code, so it might not good.
So this patch is suggesting to fix the issue by enabling thp migration for
shmem thp. Both of anon/shmem thp are migratable so we don't need
precheck about the type of thps.
Link: http://lkml.kernel.org/r/20180406030706.GA2434@hori1.linux.bs1.fc.nec.co.jp
Fixes: commit
72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zi Yan <zi.yan@sent.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Fri, 20 Apr 2018 21:55:42 +0000 (14:55 -0700)]
writeback: safer lock nesting
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.
unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains. Switches occur when
enough writes are issued from a new domain.
This existing pattern is thus suspicious:
lock_page_memcg(page);
unlocked_inode_to_wb_begin(inode, &locked);
...
unlocked_inode_to_wb_end(inode, locked);
unlock_page_memcg(page);
If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock. This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().
truncate
__cancel_dirty_page
lock_page_memcg
unlocked_inode_to_wb_begin
unlocked_inode_to_wb_end
<interrupts mistakenly enabled>
<interrupt>
end_page_writeback
test_clear_page_writeback
lock_page_memcg
<deadlock>
unlock_page_memcg
Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).
If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:
cd /mnt/cgroup/memory
mkdir a b
echo 1 > a/memory.move_charge_at_immigrate
echo 1 > b/memory.move_charge_at_immigrate
(
echo $BASHPID > a/cgroup.procs
while true; do
dd if=/dev/zero of=/mnt/big bs=1M count=256
done
) &
while true; do
sync
done &
sleep 1h &
SLEEP=$!
while true; do
echo $SLEEP > a/cgroup.procs
echo $SLEEP > b/cgroup.procs
done
The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel. I suggest we should to prevent future
surprises. And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146
Wang Long said "this deadlock occurs three times in our environment"
[gthelen@google.com: v4]
Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes:
682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reported-by: Wang Long <wanglong19@meituan.com>
Acked-by: Wang Long <wanglong19@meituan.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org> [v4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Ying [Fri, 20 Apr 2018 21:55:38 +0000 (14:55 -0700)]
mm, pagemap: fix swap offset value for PMD migration entry
The swap offset reported by /proc/<pid>/pagemap may be not correct for
PMD migration entries. If addr passed into pagemap_pmd_range() isn't
aligned with PMD start address, the swap offset reported doesn't
reflect this. And in the loop to report information of each sub-page,
the swap offset isn't increased accordingly as that for PFN.
This may happen after opening /proc/<pid>/pagemap and seeking to a page
whose address doesn't align with a PMD start address. I have verified
this with a simple test program.
BTW: migration swap entries have PFN information, do we need to restrict
whether to show them?
[akpm@linux-foundation.org: fix typo, per Huang, Ying]
Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Jerome Glisse" <jglisse@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 20 Apr 2018 21:55:35 +0000 (14:55 -0700)]
mm: fix do_pages_move status handling
Li Wang has reported that LTP move_pages04 test fails with the current
tree:
LTP move_pages04:
TFAIL : move_pages04.c:143: status[1] is EPERM, expected EFAULT
The test allocates an array of two pages, one is present while the other
is not (resp. backed by zero page) and it expects EFAULT for the second
page as the man page suggests. We are reporting EPERM which doesn't make
any sense and this is a result of a bug from
cf5f16b23ec9 ("mm: unclutter
THP migration").
do_pages_move tries to handle as many pages in one batch as possible so we
queue all pages with the same node target together and that corresponds to
[start, i] range which is then used to update status array.
add_page_for_migration will correctly notice the zero (resp. !present)
page and returns with EFAULT which gets written to the status. But if
this is the last page in the array we do not update start and so the last
store_status after the loop will overwrite the range of the last batch
with NUMA_NO_NODE (which corresponds to EPERM).
Fix this by simply bailing out from the last flush if the pagelist is
empty as there is clearly nothing more to do.
Link: http://lkml.kernel.org/r/20180418121255.334-1-mhocko@kernel.org
Fixes:
cf5f16b23ec9 ("mm: unclutter THP migration")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Li Wang <liwang@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Fri, 20 Apr 2018 21:55:31 +0000 (14:55 -0700)]
fork: unconditionally clear stack on fork
One of the classes of kernel stack content leaks[1] is exposing the
contents of prior heap or stack contents when a new process stack is
allocated. Normally, those stacks are not zeroed, and the old contents
remain in place. In the face of stack content exposure flaws, those
contents can leak to userspace.
Fixing this will make the kernel no longer vulnerable to these flaws, as
the stack will be wiped each time a stack is assigned to a new process.
There's not a meaningful change in runtime performance; it almost looks
like it provides a benefit.
Performing back-to-back kernel builds before:
Run times: 157.86 157.09 158.90 160.94 160.80
Mean: 159.12
Std Dev: 1.54
and after:
Run times: 159.31 157.34 156.71 158.15 160.81
Mean: 158.46
Std Dev: 1.46
Instead of making this a build or runtime config, Andy Lutomirski
recommended this just be enabled by default.
[1] A noisy search for many kinds of stack content leaks can be seen here:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
I did some more with perf and cycle counts on running 100,000 execs of
/bin/true.
before:
Cycles:
218858861551 218853036130 214727610969 227656844122 224980542841
Mean:
221015379122.60
Std Dev:
4662486552.47
after:
Cycles:
213868945060 213119275204 211820169456 224426673259 225489986348
Mean:
217745009865.40
Std Dev:
5935559279.99
It continues to look like it's faster, though the deviation is rather
wide, but I'm not sure what I could do that would be less noisy. I'm
open to ideas!
Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aurelien Aptel [Thu, 19 Apr 2018 08:44:20 +0000 (10:44 +0200)]
CIFS: fix typo in cifs_dbg
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reported-by: Long Li <longli@microsoft.com>
Steve French [Fri, 20 Apr 2018 17:19:07 +0000 (12:19 -0500)]
cifs: do not allow creating sockets except with SMB1 posix exensions
RHBZ: 1453123
Since at least the 3.10 kernel and likely a lot earlier we have
not been able to create unix domain sockets in a cifs share
when mounted using the SFU mount option (except when mounted
with the cifs unix extensions to Samba e.g.)
Trying to create a socket, for example using the af_unix command from
xfstests will cause :
BUG: unable to handle kernel NULL pointer dereference at
00000000
00000040
Since no one uses or depends on being able to create unix domains sockets
on a cifs share the easiest fix to stop this vulnerability is to simply
not allow creation of any other special files than char or block devices
when sfu is used.
Added update to Ronnie's patch to handle a tcon link leak, and
to address a buf leak noticed by Gustavo and Colin.
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
CC: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Cc: stable@vger.kernel.org
Linus Torvalds [Fri, 20 Apr 2018 17:56:32 +0000 (10:56 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
"A couple of fixes for the thermal subsystem"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
dt-bindings: thermal: Remove "cooling-{min|max}-level" properties
dt-bindings: thermal: remove no longer needed samsung thermal properties
Linus Torvalds [Fri, 20 Apr 2018 17:41:31 +0000 (10:41 -0700)]
Merge tag 'mmc-v4.17-3' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"A couple of MMC host fixes:
- sdhci-pci: Fixup tuning for AMD for eMMC HS200 mode
- renesas_sdhi_internal_dmac: Avoid data corruption by limiting
DMA RX"
* tag 'mmc-v4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: renesas_sdhi_internal_dmac: limit DMA RX for old SoCs
mmc: sdhci-pci: Only do AMD tuning for HS200
Linus Torvalds [Fri, 20 Apr 2018 17:39:44 +0000 (10:39 -0700)]
Merge tag 'md/4.17-rc1' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"Three small fixes for MD:
- md-cluster fix for faulty device from Guoqing
- writehint fix for writebehind IO for raid1 from Mariusz
- a live lock fix for interrupted recovery from Yufen"
* tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid1: copy write hint from master bio to behind bio
md/raid1: exit sync request if MD_RECOVERY_INTR is set
md-cluster: don't update recovery_offset for faulty device
Long Li [Tue, 17 Apr 2018 19:17:10 +0000 (12:17 -0700)]
cifs: smbd: Dump SMB packet when configured
When sending through SMB Direct, also dump the packet in SMB send path.
Also fixed a typo in debug message.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Qu Wenruo [Wed, 11 Apr 2018 09:08:12 +0000 (17:08 +0800)]
btrfs: print-tree: debugging output enhancement
This patch enhances the following things:
- tree block header
* add generation and owner output for node and leaf
- node pointer generation output
- allow btrfs_print_tree() to not follow nodes
* just like btrfs-progs
Please note that, although function btrfs_print_tree() is not called by
anyone right now, it's still a pretty useful function to debug kernel.
So that function is still kept for later use.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Wed, 18 Apr 2018 06:41:54 +0000 (09:41 +0300)]
btrfs: Fix race condition between delayed refs and blockgroup removal
When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obtains a
reference for the relevant btrfs_space_info struct by querying the bg
for the range. This is problematic because when the last extent of a
bg is deleted a race window emerges between removal of that bg and the
subsequent invocation of cleanup_ref_head. This can result in cache being null
and either a null pointer dereference or assertion failure.
task:
ffff8d04d31ed080 task.stack:
ffff9e5dc10cc000
RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
RSP: 0018:
ffff9e5dc10cfbe8 EFLAGS:
00010292
RAX:
0000000000000044 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
ffff8d04ffc1f868 RSI:
ffff8d04ffc178c8 RDI:
ffff8d04ffc178c8
RBP:
ffff8d04d29e5ea0 R08:
00000000000001f0 R09:
0000000000000001
R10:
ffff9e5dc0507d58 R11:
0000000000000001 R12:
ffff8d04d29e5ea0
R13:
ffff8d04d29e5f08 R14:
ffff8d04efe29b40 R15:
ffff8d04efe203e0
FS:
00007fbf58ead500(0000) GS:
ffff8d04ffc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fe6c6975648 CR3:
0000000013b2a000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
__btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
btrfs_run_delayed_refs+0x68/0x250 [btrfs]
btrfs_should_end_transaction+0x42/0x60 [btrfs]
btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
evict+0xc6/0x190
do_unlinkat+0x19c/0x300
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fbf589c57a7
To fix this, introduce a new flag "is_system" to head_ref structs,
which is populated at insertion time. This allows to decouple the
querying for the spaceinfo from querying the possibly deleted bg.
Fixes:
d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
CC: stable@vger.kernel.org # 4.14+
Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Howells [Fri, 20 Apr 2018 12:35:02 +0000 (13:35 +0100)]
vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion
In do_mount() when the MS_* flags are being converted to MNT_* flags,
MS_RDONLY got accidentally convered to SB_RDONLY.
Undo this change.
Fixes:
e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Wed, 18 Apr 2018 08:38:34 +0000 (09:38 +0100)]
afs: Fix server record deletion
AFS server records get removed from the net->fs_servers tree when
they're deleted, but not from the net->fs_addresses{4,6} lists, which
can lead to an oops in afs_find_server() when a server record has been
removed, for instance during rmmod.
Fix this by deleting the record from the by-address lists before posting
it for RCU destruction.
The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record
alive, so the oops would only happen when a fileserver eventually gets
bored and stops pinging or if the module gets rmmod'd and a call comes
in from the fileserver during the window between the server records
being destroyed and the socket being closed.
The oops looks something like:
BUG: unable to handle kernel NULL pointer dereference at
000000000000001c
...
Workqueue: kafsd afs_process_async_call [kafs]
RIP: 0010:afs_find_server+0x271/0x36f [kafs]
...
Call Trace:
afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
afs_deliver_to_call+0x1ee/0x5e8 [kafs]
afs_process_async_call+0x5b/0xd0 [kafs]
process_one_work+0x2c2/0x504
worker_thread+0x1d4/0x2ac
kthread+0x11f/0x127
ret_from_fork+0x24/0x30
Fixes:
d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Apr 2018 16:34:39 +0000 (09:34 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Unbalanced refcounting in TIPC, from Jon Maloy.
2) Only allow TCP_MD5SIG to be set on sockets in close or listen state.
Once the connection is established it makes no sense to change this.
From Eric Dumazet.
3) Missing attribute validation in neigh_dump_table(), also from Eric
Dumazet.
4) Fix address comparisons in SCTP, from Xin Long.
5) Neigh proxy table clearing can deadlock, from Wolfgang Bumiller.
6) Fix tunnel refcounting in l2tp, from Guillaume Nault.
7) Fix double list insert in team driver, from Paolo Abeni.
8) af_vsock.ko module was accidently made unremovable, from Stefan
Hajnoczi.
9) Fix reference to freed llc_sap object in llc stack, from Cong Wang.
10) Don't assume netdevice struct is DMA'able memory in virtio_net
driver, from Michael S. Tsirkin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
net/smc: fix shutdown in state SMC_LISTEN
bnxt_en: Fix memory fault in bnxt_ethtool_init()
virtio_net: sparse annotation fix
virtio_net: fix adding vids on big-endian
virtio_net: split out ctrl buffer
net: hns: Avoid action name truncation
docs: ip-sysctl.txt: fix name of some ipv6 variables
vmxnet3: fix incorrect dereference when rxvlan is disabled
llc: hold llc_sap before release_sock()
MAINTAINERS: Direct networking documentation changes to netdev
atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit"
net: qmi_wwan: add Wistron Neweb D19Q1
net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"
net: stmmac: Disable ACS Feature for GMAC >= 4
net: mvpp2: Fix DMA address mask size
net: change the comment of dev_mc_init
net: qualcomm: rmnet: Fix warning seen with fill_info
tun: fix vlan packet truncation
tipc: fix infinite loop when dumping link monitor summary
tipc: fix use-after-free in tipc_nametbl_stop
...
Linus Torvalds [Fri, 20 Apr 2018 16:15:14 +0000 (09:15 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Assorted fixes.
Some of that is only a matter with fault injection (broken handling of
small allocation failure in various mount-related places), but the
last one is a root-triggerable stack overflow, and combined with
userns it gets really nasty ;-/"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Don't leak MNT_INTERNAL away from internal mounts
mm,vmscan: Allow preallocating memory for register_shrinker().
rpc_pipefs: fix double-dput()
orangefs_kill_sb(): deal with allocation failures
jffs2_kill_sb(): deal with failed allocations
hypfs_kill_super(): deal with failed allocations
Linus Torvalds [Fri, 20 Apr 2018 16:08:37 +0000 (09:08 -0700)]
Merge tag 'ecryptfs-4.17-rc2-fixes' of git://git./linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Minor cleanups and a bug fix to completely ignore unencrypted
filenames in the lower filesystem when filename encryption is enabled
at the eCryptfs layer"
* tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: don't pass up plaintext names when using filename encryption
ecryptfs: fix spelling mistake: "cadidate" -> "candidate"
ecryptfs: lookup: Don't check if mount_crypt_stat is NULL
Linus Torvalds [Fri, 20 Apr 2018 16:01:26 +0000 (09:01 -0700)]
Merge tag 'for_v4.17-rc2' of git://git./linux/kernel/git/jack/linux-fs
- isofs memory leak fix
- two fsnotify fixes of event mask handling
- udf fix of UTF-16 handling
- couple other smaller cleanups
* tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix leak of UTF-16 surrogates into encoded strings
fs: ext2: Adding new return type vm_fault_t
isofs: fix potential memory leak in mount option parsing
MAINTAINERS: add an entry for FSNOTIFY infrastructure
fsnotify: fix typo in a comment about mark->g_list
fsnotify: fix ignore mask logic in send_to_group()
isofs compress: Remove VLA usage
fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
fanotify: fix logic of events on child
Linus Torvalds [Fri, 20 Apr 2018 15:55:30 +0000 (08:55 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
- suspend/resume handling fix for Raydium I2C-connected touchscreen
from Aaron Ma
- protocol fixup for certain BT-connected Wacoms from Aaron Armstrong
Skomra
- battery level reporting fix on BT-connected mice from Dmitry Torokhov
- hidraw race condition fix from Rodrigo Rivas Costa
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: i2c-hid: fix inverted return value from i2c_hid_command()
HID: i2c-hid: Fix resume issue on Raydium touchscreen device
HID: wacom: bluetooth: send exit report for recent Bluetooth devices
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
HID: input: fix battery level reporting on BT mice
Linus Torvalds [Fri, 20 Apr 2018 15:51:55 +0000 (08:51 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
"Shadow variable API list_head initialization fix from Petr Mladek"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Allow to call a custom callback when freeing shadow variables
livepatch: Initialize shadow variables safely by a custom callback
Linus Torvalds [Fri, 20 Apr 2018 15:36:04 +0000 (08:36 -0700)]
Merge tag 'for-linus-4.17-rc2-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- some fixes of kmalloc() flags
- one fix of the xenbus driver
- an update of the pv sound driver interface needed for a driver which
will go through the sound tree
* tag 'for-linus-4.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: xenbus_dev_frontend: Really return response string
xen/sndif: Sync up with the canonical definition in Xen
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_reg_add
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in xen_pcibk_config_quirks_init
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_device_alloc
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_init_device
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_probe
Linus Torvalds [Fri, 20 Apr 2018 15:25:31 +0000 (08:25 -0700)]
Merge tag 'mips_fixes_4.17_1' of git://git./linux/kernel/git/jhogan/mips
Pull MIPS fixes from James Hogan:
- io: Add barriers to read*() & write*()
- dts: Fix boston PCI bus DTC warnings (4.17)
- memset: Several corner case fixes (one 3.10, others longer)
* tag 'mips_fixes_4.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: uaccess: Add micromips clobbers to bzero invocation
MIPS: memset.S: Fix clobber of v1 in last_fixup
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
MIPS: memset.S: EVA & fault support for small_memset
MIPS: dts: Boston: Fix PCI bus dtc warnings:
MIPS: io: Add barrier after register read in readX()
MIPS: io: Prevent compiler reordering writeX()
Linus Torvalds [Fri, 20 Apr 2018 15:23:30 +0000 (08:23 -0700)]
Merge tag 'powerpc-4.17-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix an off-by-one bug in our alternative asm patching which leads to
incorrectly patched code. This bug lay dormant for nearly 10 years
but we finally hit it due to a recent change.
- Fix lockups when running KVM guests on Power8 due to a missing check
when a thread that's running KVM comes out of idle.
- Fix an out-of-spec behaviour in the XIVE code (P9 interrupt
controller).
- Fix EEH handling of bridge MMIO windows.
- Prevent crashes in our RFI fallback flush handler if firmware didn't
tell us the size of the L1 cache (only seen on simulators).
Thanks to: Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling.
* tag 'powerpc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kvm: Fix lockups when running KVM guests on Power8
powerpc/eeh: Fix enabling bridge MMIO windows
powerpc/xive: Fix trying to "push" an already active pool VP
powerpc/64s: Default l1d_size to 64K in RFI fallback flush
powerpc/lib: Fix off-by-one in alternate feature patching
Linus Torvalds [Fri, 20 Apr 2018 15:01:38 +0000 (08:01 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes and kexec-file-load from Martin Schwidefsky:
"After the common code kexec patches went in via Andrew we can now push
the architecture parts to implement the kexec-file-load system call.
Plus a few more bug fixes and cleanups, this includes an update to the
default configurations"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/signal: cleanup uapi struct sigaction
s390: rename default_defconfig to debug_defconfig
s390: remove gcov defconfig
s390: update defconfig
s390: add support for IBM z14 Model ZR1
s390: remove couple of duplicate includes
s390/boot: remove unused COMPILE_VERSION and ccflags-y
s390/nospec: include cpu.h
s390/decompressor: Ignore file vmlinux.bin.full
s390/kexec_file: add generated files to .gitignore
s390/Kconfig: Move kexec config options to "Processor type and features"
s390/kexec_file: Add ELF loader
s390/kexec_file: Add crash support to image loader
s390/kexec_file: Add image loader
s390/kexec_file: Add kexec_file_load system call
s390/kexec_file: Add purgatory
s390/kexec_file: Prepare setup.h for kexec_file_load
s390/smsgiucv: disable SMSG on module unload
s390/sclp: avoid potential usage of uninitialized value
Oskar Senft [Fri, 23 Mar 2018 13:11:30 +0000 (09:11 -0400)]
perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs
SBOX on some Broadwell CPUs is broken because it's enabled unconditionally
despite the fact that there are no SBOXes available.
Check the Power Control Unit CAPID4 register to determine the number of
available SBOXes on the particular CPU before trying to enable them. If
there are none, nullify the SBOX descriptor so it isn't tried to be
initialized.
Signed-off-by: Oskar Senft <osk@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark van Dijk <mark@voidzero.net>
Reviewed-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1521810690-2576-2-git-send-email-kan.liang@linux.intel.com
Stephane Eranian [Fri, 23 Mar 2018 13:11:29 +0000 (09:11 -0400)]
perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"
This reverts commit
3b94a891667c ("perf/x86/intel/uncore: Remove
SBOX support for Broadwell server")
Revert because there exists a proper workaround for Broadwell-EP servers
without SBOX now. Note that BDX-DE does not have a SBOX.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: osk@google.com
Cc: mark@voidzero.net
Link: https://lkml.kernel.org/r/1521810690-2576-1-git-send-email-kan.liang@linux.intel.com
Joerg Roedel [Thu, 19 Apr 2018 18:26:00 +0000 (20:26 +0200)]
x86/power/64: Fix page-table setup for temporary text mapping
On a system with 4-level page-tables there is no p4d, so the pud in the pgd
should be mapped. The old code before commit
fb43d6cb91ef already did that.
The change from above commit causes an invalid page-table which causes
undefined behavior. In one report it caused triple faults.
Fix it by changing the p4d back to pud.
Fixes:
fb43d6cb91ef ('x86/mm: Do not auto-massage page protections')
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Cc: pavel@ucw.cz
Cc: hpa@zytor.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/1524162360-26179-1-git-send-email-joro@8bytes.org
Al Viro [Fri, 20 Apr 2018 02:03:08 +0000 (22:03 -0400)]
Don't leak MNT_INTERNAL away from internal mounts
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies. As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.
Cc: stable@kernel.org
Reported-by: Alexander Aring <aring@mojatatu.com>
Bisected-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dave Jiang [Fri, 13 Apr 2018 20:47:40 +0000 (13:47 -0700)]
MAINTAINERS: Add backup maintainers for libnvdimm and DAX
Adding additional maintainers to libnvdimm related code and DAX.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dave Jiang [Thu, 19 Apr 2018 20:39:43 +0000 (13:39 -0700)]
device-dax: allow MAP_SYNC to succeed
MAP_SYNC is a nop for device-dax. Allow MAP_SYNC to succeed on device-dax
to eliminate special casing between device-dax and fs-dax as to when the
flag can be specified. Device-dax users already implicitly assume that they do
not need to call fsync(), and this enables them to explicitly check for this
capability.
Cc: <stable@vger.kernel.org>
Fixes:
b6fb293f2497 ("mm: Define MAP_SYNC and VM_SYNC flags")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 19 Apr 2018 22:07:42 +0000 (15:07 -0700)]
Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error"
With commit
df3f126482db ("libnvdimm, of_pmem: use dev_to_node() instead
of of_node_to_nid()") it is now possible to allow of_pmem to be built as
a module as originally implemented.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Rob Herring [Mon, 16 Apr 2018 16:58:16 +0000 (11:58 -0500)]
libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()
Remove the direct dependency on of_node_to_nid() by using dev_to_node()
instead. Any DT platform device will have its NUMA node id set when the
device is created.
With this, commit
291717b6fbdb ("libnvdimm, of_pmem: workaround OF_NUMA=n
build error") can be reverted.
Fixes:
717197608952 ("libnvdimm: Add device-tree based driver")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: linux-nvdimm@lists.01.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>