platform/kernel/linux-rpi.git
3 years agoperf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORE
Leo Yan [Fri, 6 Nov 2020 09:48:48 +0000 (17:48 +0800)]
perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORE

When user doesn't specify event name, perf c2c tool enables both the
load and store events, and this leads to failure for opening the
duplicate PMU device for AUX trace.

After the memory event PERF_MEM_EVENTS__LOAD_STORE is introduced, when
the user doesn't specify event name, this patch converts the required
operation to PERF_MEM_EVENTS__LOAD_STORE if the arch supports it.
Otherwise, the tool still rolls back to enable events
PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
Leo Yan [Fri, 6 Nov 2020 09:48:47 +0000 (17:48 +0800)]
perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE

On the architectures with perf memory profiling, two types of hardware
events have been supported: load and store; if want to profile memory
for both load and store operations, the tool will use these two events
at the same time, the usage is:

  # perf mem record -t load,store -- uname

But this cannot be applied for AUX tracing event, the same PMU event can
be used to only trace memory load, or only memory store, or trace for
both memory load and store.

This patch introduces a new event PERF_MEM_EVENTS__LOAD_STORE, which is
used to support the event which can record both memory load and store
operations.

When user specifies memory operation type as 'load,store', or doesn't
set type so use 'load,store' as default, if the arch supports the event
PERF_MEM_EVENTS__LOAD_STORE, the tool will convert the required
operations to this single event; otherwise, if the arch doesn't support
PERF_MEM_EVENTS__LOAD_STORE, the tool rolls back to enable both events
PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE, which keeps the same
behaviour with before.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf mem: Introduce weak function perf_mem_events__ptr()
Leo Yan [Fri, 6 Nov 2020 09:48:46 +0000 (17:48 +0800)]
perf mem: Introduce weak function perf_mem_events__ptr()

Different architectures might use different event or different event
parameters for memory profiling, this patch introduces a weak
perf_mem_events__ptr() function which allows to return back a
architecture specific memory event.

Since the variable 'perf_mem_events' can be only accessed by the
perf_mem_events__ptr() function, mark the variable as 'static', this
allows the architectures to define its own memory event array.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf mem: Search event name with more flexible path
Leo Yan [Fri, 6 Nov 2020 09:48:45 +0000 (17:48 +0800)]
perf mem: Search event name with more flexible path

The perf tool searches a memory event name under the folder
'/sys/devices/cpu/events/', this leads to the limitation for the
selection of a memory profiling event which must be under this folder.

Thus it's impossible to use any other event as memory event which is not
under this specific folder, e.g. Arm SPE hardware event is not located
in '/sys/devices/cpu/events/' so it cannot be enabled for memory
profiling.

This patch changes to search folder from '/sys/devices/cpu/events/' to
'/sys/devices', so it give flexibility to find events which can be used
for memory profiling.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf jevents: Add test for arch std events
John Garry [Thu, 22 Oct 2020 11:02:27 +0000 (19:02 +0800)]
perf jevents: Add test for arch std events

Recently there was an undetected breakage for std arch event support.

Add support in "PMU events" testcase to detect such breakages.

For this, the "test" arch needs has support added to process std arch
events. And a test event is added for the test, ifself.

Also add a few code comments to help understand the code a bit better.

Committer testing:

Before:

  # perf test -vv pmu  |& grep l3_cache_rd
  #

After:

  # perf test -vv pmu  |& grep l3_cache_rd
  testing event table l3_cache_rd: pass
  testing aliases PMU cpu: matched event l3_cache_rd
  #

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-By: Kajol Jain<kjain@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/1603364547-197086-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf jevents: Tidy error handling
John Garry [Thu, 22 Oct 2020 11:02:26 +0000 (19:02 +0800)]
perf jevents: Tidy error handling

There is much duplication in the error handling for directory transvering
for prcessing JSONs.

Factor out the common code to tidy a bit.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-By: Kajol Jain<kjain@linux.ibm.com>
Link: https://lore.kernel.org/r/1603364547-197086-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace beauty: Allow header files in a different path
Namhyung Kim [Fri, 23 Oct 2020 02:06:28 +0000 (11:06 +0900)]
perf trace beauty: Allow header files in a different path

Current script to generate mmap flags and prot checks headers from the
uapi/asm-generic directory but it might come from a different directory
in some environment.  So change the pattern to accept it.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201023020628.346257-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf stat: Add --quiet option
Andi Kleen [Tue, 27 Oct 2020 00:27:36 +0000 (17:27 -0700)]
perf stat: Add --quiet option

Add a new --quiet option to 'perf stat'. This is useful with 'perf stat
record' to write the data only to the perf.data file, which can lower
measurement overhead because the data doesn't need to be formatted.

On my 4C desktop:

  % time ./perf stat record  -e $(python -c 'print ",".join(["cycles"]*1000)')  -a -I 1000 sleep 5
  ...
  real    0m5.377s
  user    0m0.238s
  sys     0m0.452s
  % time ./perf stat record --quiet -e $(python -c 'print ",".join(["cycles"]*1000)')  -a -I 1000 sleep 5

  real    0m5.452s
  user    0m0.183s
  sys     0m0.423s

In this example it cuts the user time by 20%. On systems with more cores
the savings are higher.

Signed-off-by: Andi Kleen <andi@firstfloor.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Link: http://lore.kernel.org/lkml/20201027002737.30942-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf stat: Support regex pattern in --for-each-cgroup
Namhyung Kim [Tue, 27 Oct 2020 07:28:55 +0000 (16:28 +0900)]
perf stat: Support regex pattern in --for-each-cgroup

To make the command line even more compact with cgroups, support regex
pattern matching in cgroup names.

  $ perf stat -a -e cpu-clock,cycles --for-each-cgroup ^foo sleep 1

          3,000.73 msec cpu-clock                 foo #    2.998 CPUs utilized
    12,530,992,699      cycles                    foo #    7.517 GHz                      (100.00%)
          1,000.61 msec cpu-clock                 foo/bar #    1.000 CPUs utilized
     4,178,529,579      cycles                    foo/bar #    2.506 GHz                      (100.00%)
          1,000.03 msec cpu-clock                 foo/baz #    0.999 CPUs utilized
     4,176,104,315      cycles                    foo/baz #    2.505 GHz                      (100.00%)

       1.000892614 seconds time elapsed

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201027072855.655449-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf test: Use generic event for expand_libpfm_events()
Namhyung Kim [Tue, 27 Oct 2020 07:28:54 +0000 (16:28 +0900)]
perf test: Use generic event for expand_libpfm_events()

I found that the UNHALTED_CORE_CYCLES event is only available in the
Intel machines and it makes other vendors/archs fail on the test.  As
libpfm4 can parse the generic events like cycles, let's use them.

Fixes: 40b74c30ffb9 ("perf test: Add expand cgroup event test")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201027072855.655449-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf kvm: Add kvm-stat for arm64
Sergey Senozhatsky [Tue, 27 Oct 2020 06:24:21 +0000 (15:24 +0900)]
perf kvm: Add kvm-stat for arm64

Add support for 'perf kvm stat' on arm64 platform.

Example:

  # perf kvm stat report

Analyze events for all VMs, all VCPUs:

    VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

   DABT_LOW     661867    98.91%    40.45%      2.19us   3364.65us      6.24us ( +-   0.34% )
        IRQ       4598     0.69%    57.44%      2.89us   3397.59us   1276.27us ( +-   1.61% )
        WFx       1475     0.22%     1.71%      2.22us   3388.63us    118.31us ( +-   8.69% )
   IABT_LOW       1018     0.15%     0.38%      2.22us   2742.07us     38.29us ( +-  12.55% )
      SYS64        180     0.03%     0.01%      2.07us    112.91us      6.57us ( +-  14.95% )
      HVC64         17     0.00%     0.01%      2.19us    322.35us     42.95us ( +-  58.98% )

Total Samples:669155, Total events handled time:10216387.86us.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Suleiman Souhlal <suleiman@google.com>
Link: http://lore.kernel.org/lkml/20201027062421.463355-1-sergey.senozhatsky@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf env: Conditionally compile BPF support code on having HAVE_LIBBPF_SUPPORT
Arnaldo Carvalho de Melo [Tue, 20 Oct 2020 18:57:21 +0000 (15:57 -0300)]
perf env: Conditionally compile BPF support code on having HAVE_LIBBPF_SUPPORT

If libbpf isn't selected, no need for a bunch of related code, that were
not even being used, as code using these perf_env methods was also
enclosed in HAVE_LIBBPF_SUPPORT.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf annotate: Move bpf header inclusion to inside HAVE_LIBBPF_SUPPORT
Arnaldo Carvalho de Melo [Tue, 20 Oct 2020 18:48:23 +0000 (15:48 -0300)]
perf annotate: Move bpf header inclusion to inside HAVE_LIBBPF_SUPPORT

No need to include it otherwise.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tests: Skip the llvm and bpf tests if HAVE_LIBBPF_SUPPORT isn't defined
Arnaldo Carvalho de Melo [Tue, 20 Oct 2020 18:00:51 +0000 (15:00 -0300)]
perf tests: Skip the llvm and bpf tests if HAVE_LIBBPF_SUPPORT isn't defined

If either NO_LIBBPF=1 is passed, explicitely disabling it or if libbpf
is not available due to some missing dependency, skip its tests, telling
the user the feature isn't available.

  # perf test
  <SNIP>
  40: LLVM search and compile                                         : Skip (not compiled in)
  41: Session topology                                                : Ok
  42: BPF filter                                                      : Skip (not compiled in)
  <SNIP>

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf bpf: Enclose libbpf.h include within HAVE_LIBBPF_SUPPORT
Arnaldo Carvalho de Melo [Tue, 20 Oct 2020 17:12:37 +0000 (14:12 -0300)]
perf bpf: Enclose libbpf.h include within HAVE_LIBBPF_SUPPORT

As it uses the 'deprecated' attribute in a way that breaks the build
with old gcc compilers, so to continue being able to build in such
systems where NO_LIBBPF=1 is being used, enclose it under
HAVE_LIBBPF_SUPPORT.

   1 centos:6          : FAIL gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
   2 oraclelinux:6     : FAIL gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)

    CC       /tmp/build/perf/builtin-record.o
  In file included from util/bpf-loader.h:11,
                   from builtin-record.c:39:
  /git/linux/tools/lib/bpf/libbpf.h:203: error: wrong number of arguments specified for 'deprecated' attribute

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf test: Implement skip_reason callback for watchpoint tests
Tommi Rantala [Fri, 16 Oct 2020 13:16:50 +0000 (16:16 +0300)]
perf test: Implement skip_reason callback for watchpoint tests

Currently reason for skipping the read only watchpoint test is only seen
when running in verbose mode:

  $ perf test watchpoint
  23: Watchpoint                                            :
  23.1: Read Only Watchpoint                                : Skip
  23.2: Write Only Watchpoint                               : Ok
  23.3: Read / Write Watchpoint                             : Ok
  23.4: Modify Watchpoint                                   : Ok

  $ perf test -v watchpoint
  23: Watchpoint                                            :
  23.1: Read Only Watchpoint                                :
  --- start ---
  test child forked, pid 60204
  Hardware does not support read only watchpoints.
  test child finished with -2

Implement skip_reason callback for the watchpoint tests, so that it's
easy to see reason why the test is skipped:

  $ perf test watchpoint
  23: Watchpoint                                            :
  23.1: Read Only Watchpoint                                : Skip (missing hardware support)
  23.2: Write Only Watchpoint                               : Ok
  23.3: Read / Write Watchpoint                             : Ok
  23.4: Modify Watchpoint                                   : Ok

Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201016131650.72476-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tests tsc: Add checking helper is_supported()
Leo Yan [Mon, 19 Oct 2020 10:02:36 +0000 (18:02 +0800)]
perf tests tsc: Add checking helper is_supported()

So far tsc is enabled on x86_64, i386 and Arm64 architectures, add
checking helper to skip this testing for other architectures.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201019100236.23675-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tests tsc: Make tsc testing as a common testing
Leo Yan [Mon, 19 Oct 2020 10:02:35 +0000 (18:02 +0800)]
perf tests tsc: Make tsc testing as a common testing

x86 arch provides the testing for conversion between tsc and perf time,
the testing is located in x86 arch folder.  Move this testing out from
x86 arch folder and place it into the common testing folder, so allows
to execute tsc testing on other architectures (e.g. Arm64).

This patch removes the inclusion of "arch-tests.h" from the testing
code, this can avoid building failure if any arch has no this header
file.

Committer testing:

  $ perf test -v tsc
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
  70: Convert perf time to TSC                                        :
  --- start ---
  test child forked, pid 4032834
  mmap size 528384B
  1st event perf time 165409788843605 tsc 336578703793868
  rdtsc          time 165409788854986 tsc 336578703837038
  2nd event perf time 165409788855487 tsc 336578703838935
  test child finished with 0
  ---- end ----
  Convert perf time to TSC: Ok
  $

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201019100236.23675-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf mem2node: Improve warning if detected no memory nodes
Leo Yan [Mon, 19 Oct 2020 00:36:13 +0000 (08:36 +0800)]
perf mem2node: Improve warning if detected no memory nodes

Some archs (e.g. x86 and Arm64) don't enable the configuration
CONFIG_MEMORY_HOTPLUG by default, if this configuration is not enabled
when build the kernel image, the SysFS for memory nodes will be missed.
This results in perf tool has no chance to catpure the memory nodes
information, when perf tool reports the result and detects no memory
nodes, it outputs "assertion failed at util/mem2node.c:99".

The output log doesn't give out reason for the failure and users have no
clue for how to fix it.  This patch changes to use explicit way for
warning: it tells user that detected no memory nodes and suggests to
enable CONFIG_MEMORY_HOTPLUG for kernel building.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201019003613.8399-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf version: Add a feature for libpfm4
Ian Rogers [Mon, 19 Oct 2020 23:25:45 +0000 (16:25 -0700)]
perf version: Add a feature for libpfm4

If perf is built with libpfm4 (LIBPFM4=1) then advertise it in perf -vv.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201019232545.4047264-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf annotate mips: Add perf arch instructions annotate handlers
Dengcheng Zhu [Mon, 19 Oct 2020 07:21:24 +0000 (15:21 +0800)]
perf annotate mips: Add perf arch instructions annotate handlers

Support the MIPS architecture using the ins_ops association method. With
this patch, perf-annotate can work well on MIPS.

Testing it with a perf.data file collected on a mips machine:

$./perf annotate -i perf.data

         :           Disassembly of section .text:
         :
         :           00000000000be6a0 <get_next_seq>:
         :           get_next_seq():
    0.00 :   be6a0:       lw      v0,0(a0)
    0.00 :   be6a4:       daddiu  sp,sp,-128
    0.00 :   be6a8:       ld      a7,72(a0)
    0.00 :   be6ac:       gssq    s5,s4,80(sp)
    0.00 :   be6b0:       gssq    s1,s0,48(sp)
    0.00 :   be6b4:       gssq    s8,gp,112(sp)
    0.00 :   be6b8:       gssq    s7,s6,96(sp)
    0.00 :   be6bc:       gssq    s3,s2,64(sp)
    0.00 :   be6c0:       sd      a3,0(sp)
    0.00 :   be6c4:       move    s0,a0
    0.00 :   be6c8:       sd      v0,32(sp)
    0.00 :   be6cc:       sd      a5,8(sp)
    0.00 :   be6d0:       sd      zero,8(a0)
    0.00 :   be6d4:       sd      a6,16(sp)
    0.00 :   be6d8:       ld      s2,48(a0)
    8.53 :   be6dc:       ld      s1,40(a0)
    9.42 :   be6e0:       ld      v1,32(a0)
    0.00 :   be6e4:       nop
    0.00 :   be6e8:       ld      s4,24(a0)
    0.00 :   be6ec:       ld      s5,16(a0)
    0.00 :   be6f0:       sd      a7,40(sp)
   10.11 :   be6f4:       ld      s6,64(a0)

...

The original patch link:
https://lore.kernel.org/patchwork/patch/1180480/

Signed-off-by: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: linux-mips@vger.kernel.org
[ fanpeng@loongson.cn: Add missing "bgtzl", "bltzl", "bgezl", "blezl", "beql" and "bnel" for pre-R6processors ]
Signed-off-by: Peng Fan <fanpeng@loongson.cn>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agodoc/admin-guide: Document creation of CAP_PERFMON privileged shell
Alexey Budankov [Mon, 19 Oct 2020 17:18:12 +0000 (20:18 +0300)]
doc/admin-guide: Document creation of CAP_PERFMON privileged shell

Document steps to create CAP_PERFMON privileged shell to unblock Perf
tool usage in cases when capabilities can't be assigned to an executable
due to limitations of used file system.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Link: http://lore.kernel.org/lkml/0abda956-de6c-95b1-61e8-49e146501079@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agodoc/admin-guide: Note credentials consolidation under CAP_PERFMON
Alexey Budankov [Mon, 19 Oct 2020 17:16:49 +0000 (20:16 +0300)]
doc/admin-guide: Note credentials consolidation under CAP_PERFMON

Add note that starting from Linux v5.9 CAP_PERFMON Linux capability is
enough to conduct performance monitoring and observability using
perf_events API.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Link: http://lore.kernel.org/lkml/2b1a92a1-84ce-5c70-837d-8ffe96849588@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoMerge tag 'perf-tools-for-v5.10-2020-11-03' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Tue, 3 Nov 2020 21:28:50 +0000 (13:28 -0800)]
Merge tag 'perf-tools-for-v5.10-2020-11-03' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:
 "Only fixes and a sync of the headers so that the perf build is silent:

   - Fix visibility attribute in python module init code with newer gcc

   - Fix DRAM_BW_Use 0 issue for CLX/SKX in intel JSON vendor event
     files

   - Fix the build on new fedora by removing LTO compiler options when
     building perl support

   - Remove broken __no_tail_call attribute

   - Fix segfault when trying to trace events by cgroup

   - Fix crash with non-jited BPF progs

   - Increase buffer size in TUI browser, fixing format truncation

   - Fix printing of build-id for objects lacking one

   - Fix byte swapping for ino_generation field in MMAP2 perf.data
     records

   - Fix byte swapping for CGROUP perf.data records, for cross arch
     analysis of perf.data files

   - Fix the fast path of feature detection

   - Update kernel header copies"

* tag 'perf-tools-for-v5.10-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
  tools feature: Fixup fast path feature detection
  perf tools: Add missing swap for cgroup events
  perf tools: Add missing swap for ino_generation
  perf tools: Initialize output buffer in build_id__sprintf
  perf hists browser: Increase size of 'buf' in perf_evsel__hists_browse()
  tools include UAPI: Update linux/mount.h copy
  tools headers UAPI: Update tools's copy of linux/perf_event.h
  tools kvm headers: Update KVM headers from the kernel sources
  tools UAPI: Update copy of linux/mman.h from the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools x86 headers: Update required-features.h header from the kernel
  tools x86 headers: Update cpufeatures.h headers copies
  tools headers UAPI: Update fscrypt.h copy
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools headers UAPI: Sync prctl.h with the kernel sources
  perf scripting python: Avoid declaring function pointers with a visibility attribute
  perf tools: Remove broken __no_tail_call attribute
  perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX
  perf trace: Fix segfault when trying to trace events by cgroup
  perf tools: Fix crash with non-jited bpf progs
  ...

4 years agoMerge tag 'docs-5.10-warnings' of git://git.lwn.net/linux
Linus Torvalds [Tue, 3 Nov 2020 21:14:14 +0000 (13:14 -0800)]
Merge tag 'docs-5.10-warnings' of git://git.lwn.net/linux

Pull documentation build warning fixes from Jonathan Corbet:
 "This contains a series of warning fixes from Mauro; once applied, the
  number of warnings from the once-noisy docs build process is nearly
  zero.

  Getting to this point has required a lot of work; once there,
  hopefully we can keep things that way.

  I have packaged this as a separate pull because it does a fair amount
  of reaching outside of Documentation/. The changes are all in comments
  and in code placement. It's all been in linux-next since last week"

* tag 'docs-5.10-warnings' of git://git.lwn.net/linux: (24 commits)
  docs: SafeSetID: fix a warning
  amdgpu: fix a few kernel-doc markup issues
  selftests: kselftest_harness.h: fix kernel-doc markups
  drm: amdgpu_dm: fix a typo
  gpu: docs: amdgpu.rst: get rid of wrong kernel-doc markups
  drm: amdgpu: kernel-doc: update some adev parameters
  docs: fs: api-summary.rst: get rid of kernel-doc include
  IB/srpt: docs: add a description for cq_size member
  locking/refcount: move kernel-doc markups to the proper place
  docs: lockdep-design: fix some warning issues
  MAINTAINERS: fix broken doc refs due to yaml conversion
  ice: docs fix a devlink info that broke a table
  crypto: sun8x-ce*: update entries to its documentation
  net: phy: remove kernel-doc duplication
  mm: pagemap.h: fix two kernel-doc markups
  blk-mq: docs: add kernel-doc description for a new struct member
  docs: userspace-api: add iommu.rst to the index file
  docs: hwmon: mp2975.rst: address some html build warnings
  docs: net: statistics.rst: remove a duplicated kernel-doc
  docs: kasan.rst: add two missing blank lines
  ...

4 years agoMerge tag 'docs-5.10-3' of git://git.lwn.net/linux
Linus Torvalds [Tue, 3 Nov 2020 17:57:30 +0000 (09:57 -0800)]
Merge tag 'docs-5.10-3' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A small number of fixes, plus a build tweak to respect the desire for
  silence in V=0 builds"

* tag 'docs-5.10-3' of git://git.lwn.net/linux:
  docs: fix automarkup regression on Python 2
  documentation: arm: sunxi: add Allwinner H6 documents
  scripts: kernel-doc: split typedef complex regex
  scripts: kernel-doc: fix typedef parsing
  docs: Makefile: honor V=0 for docs building

4 years agoMerge tag 'x86_seves_for_v5.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 3 Nov 2020 17:55:09 +0000 (09:55 -0800)]
Merge tag 'x86_seves_for_v5.10_rc3' of git://git./linux/kernel/git/tip/tip

Pull x86 SEV-ES fixes from Borislav Petkov:
 "A couple of changes to the SEV-ES code to perform more stringent
  hypervisor checks before enabling encryption (Joerg Roedel)"

* tag 'x86_seves_for_v5.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Do not support MMIO to/from encrypted memory
  x86/head/64: Check SEV encryption before switching to kernel page-table
  x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path
  x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler
  x86/boot/compressed/64: Introduce sev_status

4 years agoafs: Fix incorrect freeing of the ACL passed to the YFS ACL store op
David Howells [Tue, 3 Nov 2020 16:33:07 +0000 (16:33 +0000)]
afs: Fix incorrect freeing of the ACL passed to the YFS ACL store op

The cleanup for the yfs_store_opaque_acl2_operation calls the wrong
function to destroy the ACL content buffer.  It's an afs_acl struct, not
a yfs_acl struct - and the free function for latter may pass invalid
pointers to kfree().

Fix this by using the afs_acl_put() function.  The yfs_acl_put()
function is then no longer used and can be removed.

general protection fault, probably for non-canonical address 0x7ebde00000000: 0000 [#1] SMP PTI
...
RIP: 0010:compound_head+0x0/0x11
...
Call Trace:
 virt_to_cache+0x8/0x51
 kfree+0x5d/0x79
 yfs_free_opaque_acl+0x16/0x29
 afs_put_operation+0x60/0x114
 __vfs_setxattr+0x67/0x72
 __vfs_setxattr_noperm+0x66/0xe9
 vfs_setxattr+0x67/0xce
 setxattr+0x14e/0x184
 __do_sys_fsetxattr+0x66/0x8f
 do_syscall_64+0x2d/0x3a
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoafs: Fix warning due to unadvanced marshalling pointer
David Howells [Tue, 3 Nov 2020 16:32:58 +0000 (16:32 +0000)]
afs: Fix warning due to unadvanced marshalling pointer

When using the afs.yfs.acl xattr to change an AuriStor ACL, a warning
can be generated when the request is marshalled because the buffer
pointer isn't increased after adding the last element, thereby
triggering the check at the end if the ACL wasn't empty.  This just
causes something like the following warning, but doesn't stop the call
from happening successfully:

    kAFS: YFS.StoreOpaqueACL2: Request buffer underflow (36<108)

Fix this simply by increasing the count prior to the check.

Fixes: f5e4546347bc ("afs: Implement YFS ACL setting")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodocs: fix automarkup regression on Python 2
Jonathan Corbet [Fri, 30 Oct 2020 15:35:39 +0000 (09:35 -0600)]
docs: fix automarkup regression on Python 2

It turns out that the Python 2 re module lacks the ASCII flag, so don't try
to use it there.

Fixes: f66e47f98c1e ("docs: automarkup.py: Fix regexes to solve sphinx 3 warnings")
Reported-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
4 years agotools feature: Fixup fast path feature detection
Arnaldo Carvalho de Melo [Tue, 3 Nov 2020 12:24:20 +0000 (09:24 -0300)]
tools feature: Fixup fast path feature detection

22dd1ac91a776752 ("tools: Remove feature-libelf-mmap feature detection")
correctly simplified the this feature detection, but forgot to remove
the call to the removed function in the main() function for the
test-all.c fast path feature detection, making it fail and thus do all
the feature detection individually, fix it.

  $ cat /tmp/build/perf/feature/test-all.make.output
  test-all.c: In function ‘main’:
  test-all.c:188:2: error: implicit declaration of function ‘main_test_libelf_mmap’; did you mean ‘main_test_libelf’? [-Werror=implicit-function-declaration]
    188 |  main_test_libelf_mmap();
        |  ^~~~~~~~~~~~~~~~~~~~~
        |  main_test_libelf
  cc1: all warnings being treated as errors
  $ vim tools/build/feature/test-all.c
  $ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ;make V=1 -k O=/tmp/build/perf  -C tools/perf install-bin ; perf test python
  <SNIP>
  $ cat /tmp/build/perf/feature/test-all.make.output
  $

Fixes: 22dd1ac91a776752 ("tools: Remove feature-libelf-mmap feature detection")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Add missing swap for cgroup events
Namhyung Kim [Mon, 2 Nov 2020 14:02:28 +0000 (23:02 +0900)]
perf tools: Add missing swap for cgroup events

It was missed to add a swap function for PERF_RECORD_CGROUP.

Fixes: ba78c1c5461c ("perf tools: Basic support for CGROUP event")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201102140228.303657-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Add missing swap for ino_generation
Jiri Olsa [Sun, 1 Nov 2020 23:31:03 +0000 (00:31 +0100)]
perf tools: Add missing swap for ino_generation

We are missing swap for ino_generation field.

Fixes: 5c5e854bc760 ("perf tools: Add attr->mmap2 support")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201101233103.3537427-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Initialize output buffer in build_id__sprintf
Jiri Olsa [Sun, 1 Nov 2020 23:31:02 +0000 (00:31 +0100)]
perf tools: Initialize output buffer in build_id__sprintf

We display garbage for undefined build_id objects, because we don't
initialize the output buffer.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201101233103.3537427-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf hists browser: Increase size of 'buf' in perf_evsel__hists_browse()
Song Liu [Fri, 30 Oct 2020 23:54:31 +0000 (16:54 -0700)]
perf hists browser: Increase size of 'buf' in perf_evsel__hists_browse()

Making perf with gcc-9.1.1 generates the following warning:

    CC       ui/browsers/hists.o
  ui/browsers/hists.c: In function 'perf_evsel__hists_browse':
  ui/browsers/hists.c:3078:61: error: '%d' directive output may be \
  truncated writing between 1 and 11 bytes into a region of size \
  between 2 and 12 [-Werror=format-truncation=]

   3078 |       "Max event group index to sort is %d (index from 0 to %d)",
        |                                                             ^~
  ui/browsers/hists.c:3078:7: note: directive argument in the range [-2147483648, 8]
   3078 |       "Max event group index to sort is %d (index from 0 to %d)",
        |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from /usr/include/stdio.h:937,
                   from ui/browsers/hists.c:5:

IOW, the string in line 3078 might be too long for buf[] of 64 bytes.

Fix this by increasing the size of buf[] to 128.

Fixes: dbddf1747441  ("perf report/top TUI: Support hotkeys to let user select any event for sorting")
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: stable@vger.kernel.org # v5.7+
Link: http://lore.kernel.org/lkml/20201030235431.534417-1-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools include UAPI: Update linux/mount.h copy
Arnaldo Carvalho de Melo [Tue, 3 Nov 2020 11:52:11 +0000 (08:52 -0300)]
tools include UAPI: Update linux/mount.h copy

To pick the changes from:

  dab741e0e02bd3c4 ("Add a "nosymfollow" mount option.")

That ends up adding support for the new MS_NOSYMFOLLOW mount flag:

  $ tools/perf/trace/beauty/mount_flags.sh > before
  $ cp include/uapi/linux/mount.h tools/include/uapi/linux/mount.h
  $ tools/perf/trace/beauty/mount_flags.sh > after
  $ diff -u before after
  --- before 2020-11-03 08:51:28.117997454 -0300
  +++ after 2020-11-03 08:51:38.992218869 -0300
  @@ -7,6 +7,7 @@
    [32 ? (ilog2(32) + 1) : 0] = "REMOUNT",
    [64 ? (ilog2(64) + 1) : 0] = "MANDLOCK",
    [128 ? (ilog2(128) + 1) : 0] = "DIRSYNC",
  + [256 ? (ilog2(256) + 1) : 0] = "NOSYMFOLLOW",
    [1024 ? (ilog2(1024) + 1) : 0] = "NOATIME",
    [2048 ? (ilog2(2048) + 1) : 0] = "NODIRATIME",
    [4096 ? (ilog2(4096) + 1) : 0] = "BIND",
  $

So now one can use it in --filter expressions for tracepoints.

This silences this perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/mount.h' differs from latest version at 'include/uapi/linux/mount.h'
  diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mattias Nissler <mnissler@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools headers UAPI: Update tools's copy of linux/perf_event.h
Arnaldo Carvalho de Melo [Tue, 3 Nov 2020 11:49:59 +0000 (08:49 -0300)]
tools headers UAPI: Update tools's copy of linux/perf_event.h

The diff is just tabs versus spaces, trivial.

This silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools kvm headers: Update KVM headers from the kernel sources
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 16:42:55 +0000 (13:42 -0300)]
tools kvm headers: Update KVM headers from the kernel sources

Some should cause changes in tooling, like the one adding LAST_EXCP, but
the way it is structured end up not making that happen.

The new SVM_EXIT_INVPCID should get used by arch/x86/util/kvm-stat.c,
in the svm_exit_reasons table.

The tools/perf/trace/beauty part has scripts to catch changes and
automagically create tables, like tools/perf/trace/beauty/kvm_ioctl.sh,
but changes are needed to make tools/perf/arch/x86/util/kvm-stat.c catch
those automatically.

These were handled by the existing scripts:

  $ tools/perf/trace/beauty/kvm_ioctl.sh > before
  $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
  $ tools/perf/trace/beauty/kvm_ioctl.sh > after
  $ diff -u before after
  --- before 2020-11-03 08:43:52.910728608 -0300
  +++ after 2020-11-03 08:44:04.273959984 -0300
  @@ -89,6 +89,7 @@
    [0xbf] = "SET_NESTED_STATE",
    [0xc0] = "CLEAR_DIRTY_LOG",
    [0xc1] = "GET_SUPPORTED_HV_CPUID",
  + [0xc6] = "X86_SET_MSR_FILTER",
    [0xe0] = "CREATE_DEVICE",
    [0xe1] = "SET_DEVICE_ATTR",
    [0xe2] = "GET_DEVICE_ATTR",
  $
  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
  $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
  $
  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
  $ diff -u before after
  --- before 2020-11-03 08:45:55.522225198 -0300
  +++ after 2020-11-03 08:46:12.881578666 -0300
  @@ -37,4 +37,5 @@
    [0x71] = "VDPA_GET_STATUS",
    [0x73] = "VDPA_GET_CONFIG",
    [0x76] = "VDPA_GET_VRING_NUM",
  + [0x78] = "VDPA_GET_IOVA_RANGE",
   };
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
  diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
  Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/sie.h' differs from latest version at 'arch/s390/include/uapi/asm/sie.h'
  diff -u tools/arch/s390/include/uapi/asm/sie.h arch/s390/include/uapi/asm/sie.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
  diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
  diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
  Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
  diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools UAPI: Update copy of linux/mman.h from the kernel sources
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 16:36:41 +0000 (13:36 -0300)]
tools UAPI: Update copy of linux/mman.h from the kernel sources

  e47168f3d1b14af5 ("powerpc/8xx: Support 16k hugepages with 4k pages")

That don't cause any changes in tooling, just addresses this perf build
warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
  diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools arch x86: Sync the msr-index.h copy with the kernel sources
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 16:28:04 +0000 (13:28 -0300)]
tools arch x86: Sync the msr-index.h copy with the kernel sources

To pick up the changes in:

  29dcc60f6a19fb0a ("x86/boot/compressed/64: Add stage1 #VC handler")
  36e1be8ada994d50 ("perf/x86/amd/ibs: Fix raw sample data accumulation")
  59a854e2f3b90ad2 ("perf/x86/intel: Support TopDown metrics on Ice Lake")
  7b2c05a15d29d057 ("perf/x86/intel: Generic support for hardware TopDown metrics")
  99e40204e014e066 ("x86/msr: Move the F15h MSRs where they belong")
  b57de6cd16395be1 ("x86/sev-es: Add SEV-ES Feature Detection")
  ed7bde7a6dab521e ("cpufreq: intel_pstate: Allow enable/disable energy efficiency")
  f0f2f9feb4ee6f28 ("x86/msr-index: Define an IA32_PASID MSR")

That cause these changes in tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before 2020-10-19 13:27:33.195274425 -0300
  +++ after 2020-10-19 13:27:44.144507610 -0300
  @@ -113,6 +113,8 @@
    [0x00000309] = "CORE_PERF_FIXED_CTR0",
    [0x0000030a] = "CORE_PERF_FIXED_CTR1",
    [0x0000030b] = "CORE_PERF_FIXED_CTR2",
  + [0x0000030c] = "CORE_PERF_FIXED_CTR3",
  + [0x00000329] = "PERF_METRICS",
    [0x00000345] = "IA32_PERF_CAPABILITIES",
    [0x0000038d] = "CORE_PERF_FIXED_CTR_CTRL",
    [0x0000038e] = "CORE_PERF_GLOBAL_STATUS",
  @@ -222,6 +224,7 @@
    [0x00000774] = "HWP_REQUEST",
    [0x00000777] = "HWP_STATUS",
    [0x00000d90] = "IA32_BNDCFGS",
  + [0x00000d93] = "IA32_PASID",
    [0x00000da0] = "IA32_XSS",
    [0x00000dc0] = "LBR_INFO_0",
    [0x00000ffc] = "IA32_BNDCFGS_RSVD",
  @@ -279,6 +282,7 @@
    [0xc0010115 - x86_AMD_V_KVM_MSRs_offset] = "VM_IGNNE",
    [0xc0010117 - x86_AMD_V_KVM_MSRs_offset] = "VM_HSAVE_PA",
    [0xc001011f - x86_AMD_V_KVM_MSRs_offset] = "AMD64_VIRT_SPEC_CTRL",
  + [0xc0010130 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV_ES_GHCB",
    [0xc0010131 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV",
    [0xc0010140 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_OSVW_ID_LENGTH",
    [0xc0010141 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_OSVW_STATUS",
  $

Which causes these parts of tools/perf/ to be rebuilt:

  CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
  DESCEND  plugins
  GEN      /tmp/build/perf/python/perf.so
  INSTALL  trace_plugins
  LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
  LD       /tmp/build/perf/trace/beauty/perf-in.o
  LD       /tmp/build/perf/perf-in.o
  LINK     /tmp/build/perf/per

At some point these should just be tables read by perf on demand.

This addresses this perf tools build warning:

  diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools x86 headers: Update required-features.h header from the kernel
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 16:24:52 +0000 (13:24 -0300)]
tools x86 headers: Update required-features.h header from the kernel

To pick the changes from:

  ecac71816a1829c0 ("x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT")

That don entail any changes in tooling, just addressing these perf tools
build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/required-features.h' differs from latest version at 'arch/x86/include/asm/required-features.h'
  diff -u tools/arch/x86/include/asm/required-features.h arch/x86/include/asm/required-features.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools x86 headers: Update cpufeatures.h headers copies
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 16:21:04 +0000 (13:21 -0300)]
tools x86 headers: Update cpufeatures.h headers copies

To pick the changes from:

  5866e9205b47a983 ("x86/cpu: Add hardware-enforced cache coherency as a CPUID feature")
  ff4f82816dff28ff ("x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions")
  360e7c5c4ca4fd8e ("x86/cpufeatures: Add SEV-ES CPU feature")
  18ec63faefb3fd31 ("x86/cpufeatures: Enumerate TSX suspend load address tracking instructions")
  e48cb1a3fb916500 ("x86/resctrl: Enumerate per-thread MBA controls")

Which don't cause any changes in tooling, just addresses these build
warnings:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Kyung Min Park <kyung.min.park@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools headers UAPI: Update fscrypt.h copy
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 16:12:52 +0000 (13:12 -0300)]
tools headers UAPI: Update fscrypt.h copy

To get the changes from:

  c7f0207b613033c5 ("fscrypt: make "#define fscrypt_policy" user-only")

That don't cause any changes in tools/perf, only addresses this perf
tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
  diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools headers UAPI: Sync drm/i915_drm.h with the kernel sources
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 15:41:58 +0000 (12:41 -0300)]
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources

To pick the changes in:

  13149e8bafc46572 ("drm/i915: add syncobj timeline support")
  cda9edd02425d790 ("drm/i915: introduce a mechanism to extend execbuf2")

That don't result in any changes in tooling, just silences this perf
build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
  diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools headers UAPI: Sync prctl.h with the kernel sources
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 15:38:16 +0000 (12:38 -0300)]
tools headers UAPI: Sync prctl.h with the kernel sources

To get the changes in:

  1c101da8b971a366 ("arm64: mte: Allow user control of the tag check mode via prctl()")
  af5ce95282dc99d0 ("arm64: mte: Allow user control of the generated random tags via prctl()")

Which don't cause any change in tooling, only addresses this perf build
warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
  diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf scripting python: Avoid declaring function pointers with a visibility attribute
Arnaldo Carvalho de Melo [Fri, 30 Oct 2020 11:24:38 +0000 (08:24 -0300)]
perf scripting python: Avoid declaring function pointers with a visibility attribute

To avoid this:

  util/scripting-engines/trace-event-python.c: In function 'python_start_script':
  util/scripting-engines/trace-event-python.c:1595:2: error: 'visibility' attribute ignored [-Werror=attributes]
   1595 |  PyMODINIT_FUNC (*initfunc)(void);
        |  ^~~~~~~~~~~~~~

That started breaking when building with PYTHON=python3 and these gcc
versions (I haven't checked with the clang ones, maybe it breaks there
as well):

  # export PERF_TARBALL=http://192.168.86.5/perf/perf-5.9.0.tar.xz
  # dm  fedora:33 fedora:rawhide
     1   107.80 fedora:33         : Ok   gcc (GCC) 10.2.1 20201005 (Red Hat 10.2.1-5), clang version 11.0.0 (Fedora 11.0.0-1.fc33)
     2    92.47 fedora:rawhide    : Ok   gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), clang version 11.0.0 (Fedora 11.0.0-1.fc34)
  #

Avoid that by ditching that 'initfunc' function pointer with its:

    #define Py_EXPORTED_SYMBOL _attribute_ ((visibility ("default")))
    #define PyMODINIT_FUNC Py_EXPORTED_SYMBOL PyObject*

And just call PyImport_AppendInittab() at the end of the ifdef python3
block with the functions that were being attributed to that initfunc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Remove broken __no_tail_call attribute
Peter Zijlstra [Wed, 28 Oct 2020 08:11:23 +0000 (09:11 +0100)]
perf tools: Remove broken __no_tail_call attribute

The GCC specific __attribute__((optimize)) attribute does not what is
commonly expected and is explicitly recommended against using in
production code by the GCC people.

Unlike what is often expected, it doesn't add to the optimization flags,
but it fully replaces them, loosing any and all optimization flags
provided by the compiler commandline.

The only guaranteed upon means of inhibiting tail-calls is by placing a
volatile asm with side-effects after the call such that the tail-call simply
cannot be done.

Given the original commit wasn't specific on which calls were the problem, this
removal might re-introduce the problem, which can then be re-analyzed and cured
properly.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX
Jin Yao [Fri, 23 Oct 2020 00:53:34 +0000 (08:53 +0800)]
perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX

Ian reports an issue that the metric DRAM_BW_Use often remains 0.

The metric expression for DRAM_BW_Use on CLX/SKX:

"( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time"

The counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/
are scaled up by 64, that is to turn a count of cache lines into bytes,
the count is then divided by 1000000000 to give GB.

However, the counts of uncore_imc/cas_count_read/ and
uncore_imc/cas_count_write/ have been scaled yet.

The scale values are from sysfs, such as
/sys/devices/uncore_imc_0/events/cas_count_read.scale.
It's 6.103515625e-5 (64 / 1024.0 / 1024.0).

So if we use original metric expression, the result is not correct.

But the difficulty is, for SKL client, the counts are not scaled.

The metric expression for DRAM_BW_Use on SKL:

"64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000"

root@kbl-ppc:~# perf stat -M DRAM_BW_Use -a -- sleep 1

 Performance counter stats for 'system wide':

               190      arb/event=0x84,umask=0x1/ #     1.86 DRAM_BW_Use
        29,093,178      arb/event=0x81,umask=0x1/
     1,000,703,287 ns   duration_time

       1.000703287 seconds time elapsed

The result is expected.

So the easy way is just change the metric expression for CLX/SKX.
This patch changes the metric expression to:

"( ( ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) * 1048576 ) / 1000000000 ) / duration_time"

1048576 = 1024 * 1024.

Before (tested on CLX):

root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1

 Performance counter stats for 'system wide':

            765.35 MiB  uncore_imc/cas_count_read/ #     0.00 DRAM_BW_Use
              5.42 MiB  uncore_imc/cas_count_write/
        1001515088 ns   duration_time

       1.001515088 seconds time elapsed

After:

root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1

 Performance counter stats for 'system wide':

            767.95 MiB  uncore_imc/cas_count_read/ #     0.80 DRAM_BW_Use
              5.02 MiB  uncore_imc/cas_count_write/
        1001900010 ns   duration_time

       1.001900010 seconds time elapsed

Fixes: 038d3b53c284 ("perf vendor events intel: Update CascadelakeX events to v1.08")
Fixes: b5ff7f2799a4 ("perf vendor events: Update SkylakeX events to v1.21")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201023005334.7869-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Fix segfault when trying to trace events by cgroup
Stanislav Ivanichkin [Tue, 27 Oct 2020 09:43:57 +0000 (12:43 +0300)]
perf trace: Fix segfault when trying to trace events by cgroup

  # ./perf trace -e sched:sched_switch -G test -a sleep 1
  perf: Segmentation fault
  Obtained 11 stack frames.
  ./perf(sighandler_dump_stack+0x43) [0x55cfdc636db3]
  /lib/x86_64-linux-gnu/libc.so.6(+0x3efcf) [0x7fd23eecafcf]
  ./perf(parse_cgroups+0x36) [0x55cfdc673f36]
  ./perf(+0x3186ed) [0x55cfdc70d6ed]
  ./perf(parse_options_subcommand+0x629) [0x55cfdc70e999]
  ./perf(cmd_trace+0x9c2) [0x55cfdc5ad6d2]
  ./perf(+0x1e8ae0) [0x55cfdc5ddae0]
  ./perf(+0x1e8ded) [0x55cfdc5ddded]
  ./perf(main+0x370) [0x55cfdc556f00]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fd23eeadb96]
  ./perf(_start+0x29) [0x55cfdc557389]
  Segmentation fault
  #

 It happens because "struct trace" in option->value is passed to the
 parse_cgroups function instead of "struct evlist".

Fixes: 9ea42ba4411ac ("perf trace: Support setting cgroups as targets")
Signed-off-by: Stanislav Ivanichkin <sivanichkin@yandex-team.ru>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Link: http://lore.kernel.org/lkml/20201027094357.94881-1-sivanichkin@yandex-team.ru
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Fix crash with non-jited bpf progs
Tommi Rantala [Fri, 16 Oct 2020 11:47:18 +0000 (14:47 +0300)]
perf tools: Fix crash with non-jited bpf progs

The addr in PERF_RECORD_KSYMBOL events for non-jited bpf progs points to
the bpf interpreter, ie. within kernel text section. When processing the
unregister event, this causes unexpected removal of vmlinux_map,
crashing perf later in cleanup:

  # perf record -- timeout --signal=INT 2s /usr/share/bcc/tools/execsnoop
  PCOMM            PID    PPID   RET ARGS
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.208 MB perf.data (5155 samples) ]
  perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.
  Aborted (core dumped)

  # perf script -D|grep KSYM
  0 0xa40 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6
  0 0xab0 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_8c42dee26e8cd4c2
  0 0xb20 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6
  108563691893 0x33d98 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x0 name bpf_prog_bc5697a410556fc2_syscall__execve
  108568518458 0x34098 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x0 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
  109301967895 0x34830 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x1 name bpf_prog_bc5697a410556fc2_syscall__execve
  109302007356 0x348b0 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x1 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
  perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.

Here the addresses match the bpf interpreter:

  # grep -e ffffffffa9b6b530 -e ffffffffa9b6b3b0 -e ffffffffa9b6b3f0 /proc/kallsyms
  ffffffffa9b6b3b0 t __bpf_prog_run224
  ffffffffa9b6b3f0 t __bpf_prog_run192
  ffffffffa9b6b530 t __bpf_prog_run32

Fix by not allowing vmlinux_map to be removed by PERF_RECORD_KSYMBOL
unregister event.

Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201016114718.54332-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools headers UAPI: Update process_madvise affected files
Arnaldo Carvalho de Melo [Tue, 3 Nov 2020 11:29:30 +0000 (08:29 -0300)]
tools headers UAPI: Update process_madvise affected files

To pick the changes from:

  ecb8ac8b1f146915 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")

That addresses these perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Update copy of libbpf's hashmap.c
Arnaldo Carvalho de Melo [Mon, 19 Oct 2020 15:26:45 +0000 (12:26 -0300)]
perf tools: Update copy of libbpf's hashmap.c

To pick the changes in:

  85367030a6c7ef33 ("libbpf: Centralize poisoning and poison reallocarray()")
  7d9c71e10baa3496 ("libbpf: Extract generic string hashing function for reuse")

That don't entail any changes in tools/perf.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
  diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h

Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Remove LTO compiler options when building perl support
Justin M. Forbes [Wed, 28 Oct 2020 16:59:00 +0000 (13:59 -0300)]
perf tools: Remove LTO compiler options when building perl support

To avoid breaking the build by mixing files compiled with things coming
from distro specific compiler options for perl with the rest of perf,
i.e. to avoid this:

  `.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o

Noticed on Fedora 33.

Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1593431
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://src.fedoraproject.org/rpms/kernel-tools/c/589a32b62f0c12516ab7b34e3dd30d450145bfa4?branch=master
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotty: make FONTX ioctl use the tty pointer they were actually passed
Linus Torvalds [Mon, 26 Oct 2020 20:15:23 +0000 (13:15 -0700)]
tty: make FONTX ioctl use the tty pointer they were actually passed

Some of the font tty ioctl's always used the current foreground VC for
their operations.  Don't do that then.

This fixes a data race on fg_console.

Side note: both Michael Ellerman and Jiri Slaby point out that all these
ioctls are deprecated, and should probably have been removed long ago,
and everything seems to be using the KDFONTOP ioctl instead.

In fact, Michael points out that it looks like busybox's loadfont
program seems to have switched over to using KDFONTOP exactly _because_
of this bug (ahem.. 12 years ago ;-).

Reported-by: Minh Yuan <yuanmingbuaa@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Mon, 2 Nov 2020 22:47:37 +0000 (14:47 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "Subsystems affected by this patch series: mm (memremap, memcg,
  slab-generic, kasan, mempolicy, pagecache, oom-kill, pagemap),
  kthread, signals, lib, epoll, and core-kernel"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kernel/hung_task.c: make type annotations consistent
  epoll: add a selftest for epoll timeout race
  mm: always have io_remap_pfn_range() set pgprot_decrypted()
  mm, oom: keep oom_adj under or at upper limit when printing
  kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
  mm/truncate.c: make __invalidate_mapping_pages() static
  lib/crc32test: remove extra local_irq_disable/enable
  ptrace: fix task_join_group_stop() for the case when current is traced
  mm: mempolicy: fix potential pte_unmap_unlock pte error
  kasan: adopt KUNIT tests to SW_TAGS mode
  mm: memcg: link page counters to root if use_hierarchy is false
  mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg
  hugetlb_cgroup: fix reservation accounting
  mm/mremap_pages: fix static key devmap_managed_key updates

4 years agokernel/hung_task.c: make type annotations consistent
Lukas Bulwahn [Mon, 2 Nov 2020 01:08:10 +0000 (17:08 -0800)]
kernel/hung_task.c: make type annotations consistent

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
removed various __user annotations from function signatures as part of
its refactoring.

It also removed the __user annotation for proc_dohung_task_timeout_secs()
at its declaration in sched/sysctl.h, but not at its definition in
kernel/hung_task.c.

Hence, sparse complains:

  kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces))

Adjust the annotation at the definition fitting to that refactoring to make
sparse happy again, which also resolves this warning from sparse:

  kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces)
  kernel/hung_task.c:277:52:    expected void *
  kernel/hung_task.c:277:52:    got void [noderef] __user *buffer

No functional change. No change in object code.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Ignatov <rdna@fb.com>
Link: https://lkml.kernel.org/r/20201028130541.20320-1-lukas.bulwahn@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoepoll: add a selftest for epoll timeout race
Soheil Hassas Yeganeh [Mon, 2 Nov 2020 01:08:07 +0000 (17:08 -0800)]
epoll: add a selftest for epoll timeout race

Add a test case to ensure an event is observed by at least one poller
when an epoll timeout is used.

Signed-off-by: Guantao Liu <guantaol@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Khazhismel Kumykov <khazhy@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lkml.kernel.org/r/20201028180202.952079-2-soheil.kdev@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: always have io_remap_pfn_range() set pgprot_decrypted()
Jason Gunthorpe [Mon, 2 Nov 2020 01:08:00 +0000 (17:08 -0800)]
mm: always have io_remap_pfn_range() set pgprot_decrypted()

The purpose of io_remap_pfn_range() is to map IO memory, such as a
memory mapped IO exposed through a PCI BAR.  IO devices do not
understand encryption, so this memory must always be decrypted.
Automatically call pgprot_decrypted() as part of the generic
implementation.

This fixes a bug where enabling AMD SME causes subsystems, such as RDMA,
using io_remap_pfn_range() to expose BAR pages to user space to fail.
The CPU will encrypt access to those BAR pages instead of passing
unencrypted IO directly to the device.

Places not mapping IO should use remap_pfn_range().

Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Dave Young" <dyoung@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/0-v1-025d64bdf6c4+e-amd_sme_fix_jgg@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm, oom: keep oom_adj under or at upper limit when printing
Charles Haithcock [Mon, 2 Nov 2020 01:07:56 +0000 (17:07 -0800)]
mm, oom: keep oom_adj under or at upper limit when printing

For oom_score_adj values in the range [942,999], the current
calculations will print 16 for oom_adj.  This patch simply limits the
output so output is inline with docs.

Signed-off-by: Charles Haithcock <chaithco@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Link: https://lkml.kernel.org/r/20201020165130.33927-1-chaithco@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
Zqiang [Mon, 2 Nov 2020 01:07:53 +0000 (17:07 -0800)]
kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled

There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:

CPU0 CPU1
kthread_cancel_delayed_work_sync()
   __kthread_cancel_work_sync()
     __kthread_cancel_work()
        work->canceling++;
      kthread_delayed_work_timer_fn()
   kthread_insert_work();

BUG: kthread_insert_work() should not get called when work->canceling is
set.

Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/truncate.c: make __invalidate_mapping_pages() static
Jason Yan [Mon, 2 Nov 2020 01:07:50 +0000 (17:07 -0800)]
mm/truncate.c: make __invalidate_mapping_pages() static

Fix the following sparse warning:

  mm/truncate.c:531:15: warning: symbol '__invalidate_mapping_pages' was not declared. Should it be static?

Fixes: eb1d7a65f08a ("mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20201015054808.2445904-1-yanaijie@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/crc32test: remove extra local_irq_disable/enable
Vasily Gorbik [Mon, 2 Nov 2020 01:07:47 +0000 (17:07 -0800)]
lib/crc32test: remove extra local_irq_disable/enable

Commit 4d004099a668 ("lockdep: Fix lockdep recursion") uncovered the
following issue in lib/crc32test reported on s390:

  BUG: using __this_cpu_read() in preemptible [00000000] code: swapper/0/1
  caller is lockdep_hardirqs_on_prepare+0x48/0x270
  CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.9.0-next-20201015-15164-g03d992bd2de6 #19
  Hardware name: IBM 3906 M04 704 (LPAR)
  Call Trace:
    lockdep_hardirqs_on_prepare+0x48/0x270
    trace_hardirqs_on+0x9c/0x1b8
    crc32_test.isra.0+0x170/0x1c0
    crc32test_init+0x1c/0x40
    do_one_initcall+0x40/0x130
    do_initcalls+0x126/0x150
    kernel_init_freeable+0x1f6/0x230
    kernel_init+0x22/0x150
    ret_from_fork+0x24/0x2c
  no locks held by swapper/0/1.

Remove extra local_irq_disable/local_irq_enable helpers calls.

Fixes: 5fb7f87408f1 ("lib: add module support to crc32 tests")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/patch.git-4369da00c06e.your-ad-here.call-01602859837-ext-1679@work.hours
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoptrace: fix task_join_group_stop() for the case when current is traced
Oleg Nesterov [Mon, 2 Nov 2020 01:07:44 +0000 (17:07 -0800)]
ptrace: fix task_join_group_stop() for the case when current is traced

This testcase

#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <pthread.h>
#include <assert.h>

void *tf(void *arg)
{
return NULL;
}

int main(void)
{
int pid = fork();
if (!pid) {
kill(getpid(), SIGSTOP);

pthread_t th;
pthread_create(&th, NULL, tf, NULL);

return 0;
}

waitpid(pid, NULL, WSTOPPED);

ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
waitpid(pid, NULL, 0);

ptrace(PTRACE_CONT, pid, 0,0);
waitpid(pid, NULL, 0);

int status;
int thread = waitpid(-1, &status, 0);
assert(thread > 0 && thread != pid);
assert(status == 0x80137f);

return 0;
}

fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().

This is because task_join_group_stop() has 2 problems when current is traced:

1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
   can be woken up by debugger and it can clone another thread which
   should join the group-stop.

   We need to check group_stop_count || SIGNAL_STOP_STOPPED.

2. If SIGNAL_STOP_STOPPED is already set, we should not increment
   sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
   should stop without another do_notify_parent_cldstop() report.

To clarify, the problem is very old and we should blame
ptrace_init_task().  But now that we have task_join_group_stop() it makes
more sense to fix this helper to avoid the code duplication.

Reported-by: syzbot+3485e3773f7da290eecc@syzkaller.appspotmail.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <christian@brauner.io>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019134237.GA18810@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: mempolicy: fix potential pte_unmap_unlock pte error
Shijie Luo [Mon, 2 Nov 2020 01:07:40 +0000 (17:07 -0800)]
mm: mempolicy: fix potential pte_unmap_unlock pte error

When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.

queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page.  Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte.  This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..

Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Shijie Luo <luoshijie1@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokasan: adopt KUNIT tests to SW_TAGS mode
Andrey Konovalov [Mon, 2 Nov 2020 01:07:37 +0000 (17:07 -0800)]
kasan: adopt KUNIT tests to SW_TAGS mode

Now that we have KASAN-KUNIT tests integration, it's easy to see that
some KASAN tests are not adopted to the SW_TAGS mode and are failing.

Adjust the allocation size for kasan_memchr() and kasan_memcmp() by
roung it up to OOB_TAG_OFF so the bad access ends up in a separate
memory granule.

Add a new kmalloc_uaf_16() tests that relies on UAF, and a new
kasan_bitops_tags() test that is tailored to tag-based mode, as it's
hard to adopt the existing kmalloc_oob_16() and kasan_bitops_generic()
(renamed from kasan_bitops()) without losing the precision.

Add new kmalloc_uaf_16() and kasan_bitops_uaf() tests that rely on UAFs,
as it's hard to adopt the existing kmalloc_oob_16() and
kasan_bitops_oob() (rename from kasan_bitops()) without losing the
precision.

Disable kasan_global_oob() and kasan_alloca_oob_left/right() as SW_TAGS
mode doesn't instrument globals nor dynamic allocas.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: David Gow <davidgow@google.com>
Link: https://lkml.kernel.org/r/76eee17b6531ca8b3ca92b240cb2fd23204aaff7.1603129942.git.andreyknvl@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: memcg: link page counters to root if use_hierarchy is false
Roman Gushchin [Mon, 2 Nov 2020 01:07:34 +0000 (17:07 -0800)]
mm: memcg: link page counters to root if use_hierarchy is false

Richard reported a warning which can be reproduced by running the LTP
madvise6 test (cgroup v1 in the non-hierarchical mode should be used):

  WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
  Modules linked in:
  CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
  Workqueue: events drain_local_stock
  RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
  Call Trace:
    __memcg_kmem_uncharge (mm/memcontrol.c:3022)
    drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114)
    drain_local_stock (mm/memcontrol.c:2255)
    process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
    worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
    kthread (kernel/kthread.c:292)
    ret_from_fork (arch/x86/entry/entry_64.S:300)

The problem occurs because in the non-hierarchical mode non-root page
counters are not linked to root page counters, so the charge is not
propagated to the root memory cgroup.

After the removal of the original memory cgroup and reparenting of the
object cgroup, the root cgroup might be uncharged by draining a objcg
stock, for example.  It leads to an eventual underflow of the charge and
triggers a warning.

Fix it by linking all page counters to corresponding root page counters
in the non-hierarchical mode.

Please note, that in the non-hierarchical mode all objcgs are always
reparented to the root memory cgroup, even if the hierarchy has more
than 1 level.  This patch doesn't change it.

The patch also doesn't affect how the hierarchical mode is working,
which is the only sane and truly supported mode now.

Thanks to Richard for reporting, debugging and providing an alternative
version of the fix!

Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Reported-by: <ltp@lists.linux.it>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201026231326.3212225-1-guro@fb.com
Debugged-by: Richard Palethorpe <rpalethorpe@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg
zhongjiang-ali [Mon, 2 Nov 2020 01:07:30 +0000 (17:07 -0800)]
mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg

memcg_page_state will get the specified number in hierarchical memcg, It
should multiply by HPAGE_PMD_NR rather than an page if the item is
NR_ANON_THPS.

[akpm@linux-foundation.org: fix printk warning]
[akpm@linux-foundation.org: use u64 cast, per Michal]

Fixes: 468c398233da ("mm: memcontrol: switch to native NR_ANON_THPS counter")
Signed-off-by: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/1603722395-72443-1-git-send-email-zhongjiang-ali@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agohugetlb_cgroup: fix reservation accounting
Mike Kravetz [Mon, 2 Nov 2020 01:07:27 +0000 (17:07 -0800)]
hugetlb_cgroup: fix reservation accounting

Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and hit the warning below.  QEMU with free page hinting
uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported
as free by a VM.  The reporting granularity is in pageblock granularity.
So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE)
one huge page in QEMU.

  WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
  Modules linked in: ...
  CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
  Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
  RIP: 0010:page_counter_uncharge+0x4b/0x50
  ...
  Call Trace:
    hugetlb_cgroup_uncharge_file_region+0x4b/0x80
    region_del+0x1d3/0x300
    hugetlb_unreserve_pages+0x39/0xb0
    remove_inode_hugepages+0x1a8/0x3d0
    hugetlbfs_fallocate+0x3c4/0x5c0
    vfs_fallocate+0x146/0x290
    __x64_sys_fallocate+0x3e/0x70
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Investigation of the issue uncovered bugs in hugetlb cgroup reservation
accounting.  This patch addresses the found issues.

Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings")
Reported-by: Michal Privoznik <mprivozn@redhat.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lkml.kernel.org/r/20201021204426.36069-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/mremap_pages: fix static key devmap_managed_key updates
Ralph Campbell [Mon, 2 Nov 2020 01:07:23 +0000 (17:07 -0800)]
mm/mremap_pages: fix static key devmap_managed_key updates

commit 6f42193fd86e ("memremap: don't use a separate devm action for
devmap_managed_enable_get") changed the static key updates such that we
now call devmap_managed_enable_put() without doing the equivalent
devmap_managed_enable_get().

devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
types too.  This results in the below warning when switching between
system-ram and devdax mode for devdax namespace.

   jump label: negative count!
   WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
   Modules linked in:
   ....

   NIP static_key_slow_try_dec+0x88/0xa0
   LR static_key_slow_try_dec+0x84/0xa0
   Call Trace:
     static_key_slow_try_dec+0x84/0xa0
     __static_key_slow_dec_cpuslocked+0x34/0xd0
     static_key_slow_dec+0x54/0xf0
     memunmap_pages+0x36c/0x500
     devm_action_release+0x30/0x50
     release_nodes+0x2f4/0x3e0
     device_release_driver_internal+0x17c/0x280
     bus_remove_device+0x124/0x210
     device_del+0x1d4/0x530
     unregister_dev_dax+0x48/0xe0
     devm_action_release+0x30/0x50
     release_nodes+0x2f4/0x3e0
     device_release_driver_internal+0x17c/0x280
     unbind_store+0x130/0x170
     drv_attr_store+0x40/0x60
     sysfs_kf_write+0x6c/0xb0
     kernfs_fop_write+0x118/0x280
     vfs_write+0xe8/0x2a0
     ksys_write+0x84/0x140
     system_call_exception+0x120/0x270
     system_call_common+0xf0/0x27c

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lkml.kernel.org/r/20201023183222.13186-1-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoof: Drop superfluous ULL suffix for ~0
Geert Uytterhoeven [Mon, 2 Nov 2020 10:54:22 +0000 (11:54 +0100)]
of: Drop superfluous ULL suffix for ~0

There is no need to specify a "ULL" suffix for "all bits set": "~0" is
sufficient, and works regardless of type.  In fact adding the suffix
makes the code more fragile.

Fixes: 48ab6d5d1f09 ("dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=n")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoLinux 5.10-rc2
Linus Torvalds [Sun, 1 Nov 2020 22:43:51 +0000 (14:43 -0800)]
Linux 5.10-rc2

4 years agoMerge tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Nov 2020 19:21:26 +0000 (11:21 -0800)]
Merge tag 'x86-urgent-2020-11-01' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Three fixes all related to #DB:

   - Handle the BTF bit correctly so it doesn't get lost due to a kernel
     #DB

   - Only clear and set the virtual DR6 value used by ptrace on user
     space triggered #DB. A kernel #DB must leave it alone to ensure
     data consistency for ptrace.

   - Make the bitmasking of the virtual DR6 storage correct so it does
     not lose DR_STEP"

* tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6)
  x86/debug: Only clear/set ->virtual_dr6 for userspace #DB
  x86/debug: Fix BTF handling

4 years agoMerge tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 1 Nov 2020 19:13:45 +0000 (11:13 -0800)]
Merge tag 'timers-urgent-2020-11-01' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "A few fixes for timers/timekeeping:

   - Prevent undefined behaviour in the timespec64_to_ns() conversion
     which is used for converting user supplied time input to
     nanoseconds. It lacked overflow protection.

   - Mark sched_clock_read_begin/retry() to prevent recursion in the
     tracer

   - Remove unused debug functions in the hrtimer and timerlist code"

* tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Prevent undefined behaviour in timespec64_to_ns()
  timers: Remove unused inline funtion debug_timer_free()
  hrtimer: Remove unused inline function debug_hrtimer_free()
  time/sched_clock: Mark sched_clock_read_begin/retry() as notrace

4 years agoMerge tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Nov 2020 19:11:38 +0000 (11:11 -0800)]
Merge tag 'smp-urgent-2020-11-01' of git://git./linux/kernel/git/tip/tip

Pull smp fix from Thomas Gleixner:
 "A single fix for stop machine.

  Mark functions no trace to prevent a crash caused by recursion when
  enabling or disabling a tracer on RISC-V (probably all architectures
  which patch through stop machine)"

* tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stop_machine, rcu: Mark functions as notrace

4 years agoMerge tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 1 Nov 2020 19:08:17 +0000 (11:08 -0800)]
Merge tag 'locking-urgent-2020-11-01' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "A couple of locking fixes:

   - Fix incorrect failure injection handling in the fuxtex code

   - Prevent a preemption warning in lockdep when tracking
     local_irq_enable() and interrupts are already enabled

   - Remove more raw_cpu_read() usage from lockdep which causes state
     corruption on !X86 architectures.

   - Make the nr_unused_locks accounting in lockdep correct again"

* tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Fix nr_unused_locks accounting
  locking/lockdep: Remove more raw_cpu_read() usage
  futex: Fix incorrect should_fail_futex() handling
  lockdep: Fix preemption WARN for spurious IRQ-enable

4 years agoMerge tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 1 Nov 2020 18:05:16 +0000 (10:05 -0800)]
Merge tag 'char-misc-5.10-rc2' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes/removals from Greg KH:
 "Here's some small fixes for 5.10-rc2 and a big driver removal.

  The fixes are for some reported issues in the interconnect and
  coresight drivers, nothing major.

  The "big" driver removal is the MIC drivers have been asked to be
  removed as the hardware never shipped and Intel no longer wants to
  maintain something that no one can use. This is welcomed by many as
  the DMA usage of these drivers was "interesting" and the security
  people were starting to question some issues that were starting to be
  found in the codebase.

  Note, one of the subsystems for this driver, the "VOP" code, will
  probably come back in future kernel versions as it was looking to
  potentially solve some PCIe virtualization issues that a number of
  other vendors were wanting to solve. But as-is, this codebase didn't
  work for anyone else so no actual functionality is being removed.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  coresight: cti: Initialize dynamic sysfs attributes
  coresight: Fix uninitialised pointer bug in etm_setup_aux()
  coresight: add module license
  misc: mic: remove the MIC drivers
  interconnect: qcom: use icc_sync state for sm8[12]50
  interconnect: qcom: Ensure that the floor bandwidth value is enforced
  interconnect: qcom: sc7180: Init BCMs before creating the nodes
  interconnect: qcom: sdm845: Init BCMs before creating the nodes
  interconnect: Aggregate before setting initial bandwidth
  interconnect: qcom: sdm845: Enable keepalive for the MM1 BCM

4 years agoMerge tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Nov 2020 17:59:13 +0000 (09:59 -0800)]
Merge tag 'driver-core-5.10-rc2' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core and documentation fixes from Greg KH:
 "Here is one tiny debugfs change to fix up an API where the last user
  was successfully fixed up in 5.10-rc1 (so it couldn't be merged
  earlier), and a much larger Documentation/ABI/ update to the files so
  they can be automatically parsed by our tools.

  The Documentation/ABI/ updates are just formatting issues, small ones
  to bring the files into parsable format, and have been acked by
  numerous subsystem maintainers and the documentation maintainer. I
  figured it was good to get this into 5.10-rc2 to help wih the merge
  issues that would arise if these were to stick in linux-next until
  5.11-rc1.

  The debugfs change has been in linux-next for a long time, and the
  Documentation updates only for the last linux-next release"

* tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (40 commits)
  scripts: get_abi.pl: assume ReST format by default
  docs: ABI: sysfs-class-led-trigger-pattern: remove hw_pattern duplication
  docs: ABI: sysfs-class-backlight: unify ABI documentation
  docs: ABI: sysfs-c2port: remove a duplicated entry
  docs: ABI: sysfs-class-power: unify duplicated properties
  docs: ABI: unify /sys/class/leds/<led>/brightness documentation
  docs: ABI: stable: remove a duplicated documentation
  docs: ABI: change read/write attributes
  docs: ABI: cleanup several ABI documents
  docs: ABI: sysfs-bus-nvdimm: use the right format for ABI
  docs: ABI: vdso: use the right format for ABI
  docs: ABI: fix syntax to be parsed using ReST notation
  docs: ABI: convert testing/configfs-acpi to ReST
  docs: Kconfig/Makefile: add a check for broken ABI files
  docs: abi-testing.rst: enable --rst-sources when building docs
  docs: ABI: don't escape ReST-incompatible chars from obsolete and removed
  docs: ABI: create a 2-depth index for ABI
  docs: ABI: make it parse ABI/stable as ReST-compatible files
  docs: ABI: sysfs-uevent: make it compatible with ReST output
  docs: ABI: testing: make the files compatible with ReST output
  ...

4 years agoMerge tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 1 Nov 2020 17:57:24 +0000 (09:57 -0800)]
Merge tag 'staging-5.10-rc2' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are some small staging driver fixes for issues that have been
  reported in 5.10-rc1:

   - octeon driver fixes

   - wfx driver fixes

   - memory leak fix in vchiq driver

   - fieldbus driver bugfix

   - comedi driver bugfix

  All of these have been in linux-next with no reported issues"

* tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: fieldbus: anybuss: jump to correct label in an error path
  staging: wfx: fix test on return value of gpiod_get_value()
  staging: wfx: fix use of uninitialized pointer
  staging: mmal-vchiq: Fix memory leak for vchiq_instance
  staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
  staging: octeon: Drop on uncorrectable alignment or FCS error
  staging: octeon: repair "fixed-link" support

4 years agoMerge tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 1 Nov 2020 17:55:36 +0000 (09:55 -0800)]
Merge tag 'tty-5.10-rc2' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small TTY and Serial driver fixes for reported issues
  for 5.10-rc2. They include:

   - vt ioctl bugfix for reported problems

   - fsl_lpuart serial driver fix

   - 21285 serial driver bugfix

  All have been in linux-next with no reported issues"

* tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  vt_ioctl: fix GIO_UNIMAP regression
  vt: keyboard, extend func_buf_lock to readers
  vt: keyboard, simplify vt_kdgkbsent
  tty: serial: fsl_lpuart: LS1021A has a FIFO size of 16 words, like LS1028A
  tty: serial: 21285: fix lockup on open

4 years agoMerge tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 1 Nov 2020 17:53:38 +0000 (09:53 -0800)]
Merge tag 'usb-5.10-rc2' of git://git./linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are a number of small bugfixes for reported issues in some USB
  drivers. They include:

   - typec bugfixes

   - xhci bugfixes and lockdep warning fixes

   - cdc-acm driver regression fix

   - kernel doc fixes

   - cdns3 driver bugfixes for a bunch of reported issues

   - other tiny USB driver fixes

  All have been in linux-next with no reported issues"

* tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdns3: gadget: own the lock wrongly at the suspend routine
  usb: cdns3: Fix on-chip memory overflow issue
  usb: cdns3: gadget: suspicious implicit sign extension
  xhci: Don't create stream debugfs files with spinlock held.
  usb: xhci: Workaround for S3 issue on AMD SNPS 3.0 xHC
  xhci: Fix sizeof() mismatch
  usb: typec: stusb160x: fix signedness comparison issue with enum variables
  usb: typec: add missing MODULE_DEVICE_TABLE() to stusb160x
  USB: apple-mfi-fastcharge: don't probe unhandled devices
  usbcore: Check both id_table and match() when both available
  usb: host: ehci-tegra: Fix error handling in tegra_ehci_probe()
  usb: typec: stusb160x: fix an IS_ERR() vs NULL check in probe
  usb: typec: tcpm: reset hard_reset_count for any disconnect
  usb: cdc-acm: fix cooldown mechanism
  usb: host: fsl-mph-dr-of: check return of dma_set_mask()
  usb: fix kernel-doc markups
  usb: typec: stusb160x: fix some signedness bugs
  usb: cdns3: Variable 'length' set but not used

4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 1 Nov 2020 17:43:32 +0000 (09:43 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:
   - selftest fix
   - force PTE mapping on device pages provided via VFIO
   - fix detection of cacheable mapping at S2
   - fallback to PMD/PTE mappings for composite huge pages
   - fix accounting of Stage-2 PGD allocation
   - fix AArch32 handling of some of the debug registers
   - simplify host HYP entry
   - fix stray pointer conversion on nVHE TLB invalidation
   - fix initialization of the nVHE code
   - simplify handling of capabilities exposed to HYP
   - nuke VCPUs caught using a forbidden AArch32 EL0

  x86:
   - new nested virtualization selftest
   - miscellaneous fixes
   - make W=1 fixes
   - reserve new CPUID bit in the KVM leaves"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: vmx: remove unused variable
  KVM: selftests: Don't require THP to run tests
  KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again
  KVM: selftests: test behavior of unmapped L2 APIC-access address
  KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()
  KVM: x86: replace static const variables with macros
  KVM: arm64: Handle Asymmetric AArch32 systems
  arm64: cpufeature: upgrade hyp caps to final
  arm64: cpufeature: reorder cpus_have_{const, final}_cap()
  KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()
  KVM: arm64: Force PTE mapping on fault resulting in a device mapping
  KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes
  KVM: arm64: Fix masks in stage2_pte_cacheable()
  KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
  KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT
  KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition
  KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation
  KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call
  x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID

4 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Sat, 31 Oct 2020 21:41:48 +0000 (14:41 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull vhost fixes from Michael Tsirkin:
 "Fixes all over the place.

  A new UAPI is borderline: can also be considered a new feature but
  also seems to be the only way we could come up with to fix addressing
  for userspace - and it seems important to switch to it now before
  userspace making assumptions about addressing ability of devices is
  set in stone"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpasim: allow to assign a MAC address
  vdpasim: fix MAC address configuration
  vdpa: handle irq bypass register failure case
  vdpa_sim: Fix DMA mask
  Revert "vhost-vdpa: fix page pinning leakage in error path"
  vdpa/mlx5: Fix error return in map_direct_mr()
  vhost_vdpa: Return -EFAULT if copy_from_user() fails
  vdpa_sim: implement get_iova_range()
  vhost: vdpa: report iova range
  vdpa: introduce config op to get valid iova range

4 years agoMerge tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 31 Oct 2020 21:31:28 +0000 (14:31 -0700)]
Merge tag 'flexible-array-conversions-5.10-rc2' of git://git./linux/kernel/git/gustavoars/linux

Pull more flexible-array member conversions from Gustavo A. R. Silva:
 "Replace zero-length arrays with flexible-array members"

* tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  printk: ringbuffer: Replace zero-length array with flexible-array member
  net/smc: Replace zero-length array with flexible-array member
  net/mlx5: Replace zero-length array with flexible-array member
  mei: hw: Replace zero-length array with flexible-array member
  gve: Replace zero-length array with flexible-array member
  Bluetooth: btintel: Replace zero-length array with flexible-array member
  scsi: target: tcmu: Replace zero-length array with flexible-array member
  ima: Replace zero-length array with flexible-array member
  enetc: Replace zero-length array with flexible-array member
  fs: Replace zero-length array with flexible-array member
  Bluetooth: Replace zero-length array with flexible-array member
  params: Replace zero-length array with flexible-array member
  tracepoint: Replace zero-length array with flexible-array member
  platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member
  platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member
  mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member
  dmaengine: ti-cppi5: Replace zero-length array with flexible-array member

4 years agoMerge tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Sat, 31 Oct 2020 19:25:58 +0000 (12:25 -0700)]
Merge tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
 "Fix an integer overflow on 32-bit platforms in the new DMA range code
  (Geert Uytterhoeven)"

* tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=n

4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 31 Oct 2020 19:21:04 +0000 (12:21 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four driver fixes and one core fix.

  The core fix closes a race window where we could kick off a second
  asynchronous scan because the test and set of the variable preventing
  it isn't atomic"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: hisi_sas: Stop using queue #0 always for v2 hw
  scsi: ibmvscsi: Fix potential race after loss of transport
  scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()
  scsi: qla2xxx: Return EBUSY on fcport deletion
  scsi: core: Don't start concurrent async scan on same host

4 years agoKVM: vmx: remove unused variable
Paolo Bonzini [Sat, 31 Oct 2020 15:38:09 +0000 (11:38 -0400)]
KVM: vmx: remove unused variable

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: Don't require THP to run tests
Andrew Jones [Thu, 29 Oct 2020 20:17:00 +0000 (21:17 +0100)]
KVM: selftests: Don't require THP to run tests

Unless we want to test with THP, then we shouldn't require it to be
configured by the host kernel. Unfortunately, even advising with
MADV_NOHUGEPAGE does require it, so check for THP first in order
to avoid madvise failing with EINVAL.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201029201703.102716-2-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again
Vitaly Kuznetsov [Wed, 14 Oct 2020 14:33:46 +0000 (16:33 +0200)]
KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again

It was noticed that evmcs_sanitize_exec_ctrls() is not being executed
nowadays despite the code checking 'enable_evmcs' static key looking
correct. Turns out, static key magic doesn't work in '__init' section
(and it is unclear when things changed) but setup_vmcs_config() is called
only once per CPU so we don't really need it to. Switch to checking
'enlightened_vmcs' instead, it is supposed to be in sync with
'enable_evmcs'.

Opportunistically make evmcs_sanitize_exec_ctrls '__init' and drop unneeded
extra newline from it.

Reported-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201014143346.2430936-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: test behavior of unmapped L2 APIC-access address
Jim Mattson [Mon, 26 Oct 2020 18:09:22 +0000 (11:09 -0700)]
KVM: selftests: test behavior of unmapped L2 APIC-access address

Add a regression test for commit 671ddc700fd0 ("KVM: nVMX: Don't leak
L1 MMIO regions to L2").

First, check to see that an L2 guest can be launched with a valid
APIC-access address that is backed by a page of L1 physical memory.

Next, set the APIC-access address to a (valid) L1 physical address
that is not backed by memory. KVM can't handle this situation, so
resuming L2 should result in a KVM exit for internal error
(emulation).

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201026180922.3120555-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge tag 'block-5.10-2020-10-30' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 30 Oct 2020 22:02:49 +0000 (15:02 -0700)]
Merge tag 'block-5.10-2020-10-30' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - null_blk zone fixes (Damien, Kanchan)

 - NVMe pull request from Christoph:
       - improve zone revalidation (Keith Busch)
       - gracefully handle zero length messages in nvme-rdma (zhenwei pi)
       - nvme-fc error handling fixes (James Smart)
       - nvmet tracing NULL pointer dereference fix (Chaitanya Kulkarni)"

 - xsysace platform fixes (Andy)

 - scatterlist type cleanup (David)

 - blk-cgroup memory fixes (Gabriel)

 - nbd block size update fix (Ming)

 - Flush completion state fix (Ming)

 - bio_add_hw_page() iteration fix (Naohiro)

* tag 'block-5.10-2020-10-30' of git://git.kernel.dk/linux-block:
  blk-mq: mark flush request as IDLE in flush_end_io()
  lib/scatterlist: use consistent sg_copy_buffer() return type
  xsysace: use platform_get_resource() and platform_get_irq_optional()
  null_blk: Fix locking in zoned mode
  null_blk: Fix zone reset all tracing
  nbd: don't update block size after device is started
  block: advance iov_iter on bio_add_hw_page failure
  null_blk: synchronization fix for zoned device
  nvmet: fix a NULL pointer dereference when tracing the flush command
  nvme-fc: remove nvme_fc_terminate_io()
  nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery
  nvme-fc: remove err_work work item
  nvme-fc: track error_recovery while connecting
  nvme-rdma: handle unexpected nvme completion data length
  nvme: ignore zone validate errors on subsequent scans
  blk-cgroup: Pre-allocate tree node on blkg_conf_prep
  blk-cgroup: Fix memleak on error path

4 years agoprintk: ringbuffer: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Tue, 27 Oct 2020 18:50:19 +0000 (13:50 -0500)]
printk: ringbuffer: Replace zero-length array with flexible-array member

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agonet/smc: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Tue, 27 Oct 2020 06:33:56 +0000 (01:33 -0500)]
net/smc: Replace zero-length array with flexible-array member

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agonet/mlx5: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Tue, 27 Oct 2020 20:28:40 +0000 (15:28 -0500)]
net/mlx5: Replace zero-length array with flexible-array member

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agomei: hw: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Tue, 27 Oct 2020 06:15:49 +0000 (01:15 -0500)]
mei: hw: Replace zero-length array with flexible-array member

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agogve: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Tue, 27 Oct 2020 21:30:45 +0000 (16:30 -0500)]
gve: Replace zero-length array with flexible-array member

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure.  Kernel code
should always use “flexible array members”[1] for these cases.  The
older style of one-element or zero-length arrays should no longer be
used[2].

Refactor the code according to the use of a flexible-array member in
struct gve_stats_report, instead of a zero-length array, and use the
struct_size() helper to calculate the size for the resource allocation.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agoBluetooth: btintel: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Tue, 27 Oct 2020 05:54:08 +0000 (00:54 -0500)]
Bluetooth: btintel: Replace zero-length array with flexible-array member

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
4 years agoMerge tag 'io_uring-5.10-2020-10-30' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 30 Oct 2020 21:55:36 +0000 (14:55 -0700)]
Merge tag 'io_uring-5.10-2020-10-30' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Fixes for linked timeouts (Pavel)

 - Set IO_WQ_WORK_CONCURRENT early for async offload (Pavel)

 - Two minor simplifications that make the code easier to read and
   follow (Pavel)

* tag 'io_uring-5.10-2020-10-30' of git://git.kernel.dk/linux-block:
  io_uring: use type appropriate io_kiocb handler for double poll
  io_uring: simplify __io_queue_sqe()
  io_uring: simplify nxt propagation in io_queue_sqe
  io_uring: don't miss setting IO_WQ_WORK_CONCURRENT
  io_uring: don't defer put of cancelled ltimeout
  io_uring: always clear LINK_TIMEOUT after cancel
  io_uring: don't adjust LINK_HEAD in cancel ltimeout
  io_uring: remove opcode check on ltimeout kill

4 years agoMerge tag 'libata-5.10-2020-10-30' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 30 Oct 2020 21:51:01 +0000 (14:51 -0700)]
Merge tag 'libata-5.10-2020-10-30' of git://git.kernel.dk/linux-block

Pull libata fix from Jens Axboe:
 "Single fix for an old regression with sata_nv"

* tag 'libata-5.10-2020-10-30' of git://git.kernel.dk/linux-block:
  ata: sata_nv: Fix retrieving of active qcs

4 years agoMerge tag 'for-5.10-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Fri, 30 Oct 2020 20:29:49 +0000 (13:29 -0700)]
Merge tag 'for-5.10-rc1-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - lockdep fixes:
     - drop path locks before manipulating sysfs objects or qgroups
     - preliminary fixes before tree locks get switched to rwsem
     - use annotated seqlock

 - build warning fixes (printk format)

 - fix relocation vs fallocate race

 - tree checker properly validates number of stripes and parity

 - readahead vs device replace fixes

 - iomap dio fix for unnecessary buffered io fallback

* tag 'for-5.10-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: convert data_seqcount to seqcount_mutex_t
  btrfs: don't fallback to buffered read if we don't need to
  btrfs: add a helper to read the tree_root commit root for backref lookup
  btrfs: drop the path before adding qgroup items when enabling qgroups
  btrfs: fix readahead hang and use-after-free after removing a device
  btrfs: fix use-after-free on readahead extent after failure to create it
  btrfs: tree-checker: validate number of chunk stripes and parity
  btrfs: tree-checker: fix incorrect printk format
  btrfs: drop the path before adding block group sysfs files
  btrfs: fix relocation failure due to race with fallocate

4 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 30 Oct 2020 20:16:03 +0000 (13:16 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "The diffstat is a bit spread out thanks to an invasive CPU erratum
  workaround which missed the merge window and also a bunch of fixes to
  the recently added MTE selftests.

   - Fixes to MTE kselftests

   - Fix return code from KVM Spectre-v2 hypercall

   - Build fixes for ld.lld and Clang's infamous integrated assembler

   - Ensure RCU is up and running before we use printk()

   - Workaround for Cortex-A77 erratum 1508412

   - Fix linker warnings from unexpected ELF sections

   - Ensure PE/COFF sections are 64k aligned"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Change .weak to SYM_FUNC_START_WEAK_PI for arch/arm64/lib/mem*.S
  arm64/smp: Move rcu_cpu_starting() earlier
  arm64: Add workaround for Arm Cortex-A77 erratum 1508412
  arm64: Add part number for Arm Cortex-A77
  arm64: mte: Document that user PSTATE.TCO is ignored by kernel uaccess
  module: use hidden visibility for weak symbol references
  arm64: efi: increase EFI PE/COFF header padding to 64 KB
  arm64: vmlinux.lds: account for spurious empty .igot.plt sections
  kselftest/arm64: Fix check_user_mem test
  kselftest/arm64: Fix check_ksm_options test
  kselftest/arm64: Fix check_mmap_options test
  kselftest/arm64: Fix check_child_memory test
  kselftest/arm64: Fix check_tags_inclusion test
  kselftest/arm64: Fix check_buffer_fill test
  arm64: avoid -Woverride-init warning
  KVM: arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIRED
  arm64: vdso32: Allow ld.lld to properly link the VDSO