Arnaldo Carvalho de Melo [Wed, 3 Feb 2016 20:24:11 +0000 (17:24 -0300)]
perf build tests: Move the feature related vars to the front of the make cmdline
So that we do less visual searching on the 'make build-test' output to
see the feature related variables:
After:
$ make -C tools/perf build-test
<SNIP>
make_no_newt_O: cd . && make NO_NEWT=1 FEATURES_DUMP=/home/acme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.dz55IX DESTDIR=/tmp/tmp.X29xxo
make_tags_O: cd . && make tags FEATURES_DUMP=/home/acme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.6ecLh8 DESTDIR=/tmp/tmp.6vIla578Ho
make_util_pmu_bison_o_O: cd . && make util/pmu-bison.o FEATURES_DUMP=/home/acme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.SVPM2G DESTDIR=/tmp/tmp.C0oAam
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-dx4krgzqa566v1pedrbrcchi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 3 Feb 2016 20:16:32 +0000 (17:16 -0300)]
perf build tests: Elide "-f Makefile" from make invokation
Since this is the name that 'make' will look for if no explicit -f file
is passed.
This in turn makes the output of 'build-test' more compact:
Before:
$ perf stat make -C tools/perf build-test
<SNIP>
cd . && make FEATURE_DUMP_COPY=/home/acme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump
make_no_libaudit_O: cd . && make -f Makefile O=/tmp/tmp.tHIa0Kkk2Y DESTDIR=/tmp/tmp.foK7rckkVi NO_LIBAUDIT=1 FEATURES_DUMP=/home/acme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP
<SNIP>
After:
$ perf stat make -C tools/perf build-test
<SNIP>
make_no_libaudit_O: cd . && make O=/tmp/tmp.tHIa0Kkk2Y DESTDIR=/tmp/tmp.foK7rckkVi NO_LIBAUDIT=1 FEATURES_DUMP=/home/acme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP
<SNIP>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-m440lb8dkfsywsyah0htif6t@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Thu, 4 Feb 2016 07:58:01 +0000 (08:58 +0100)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Add 'L' hotkey to dynamicly set the percent threshold for histogram
entries and callchains, i.e. dynamicly do what the --percent-limit
command line option to 'top' and 'report' does. (Namhyung Kim)
Infrastructure changes:
- Per hists field and sort lists, that will be used, for instance,
in the c2c tool (Jiri Olsa)
Documentation changes:
- Update documentation of --sort and --perf-limit options
for 'perf report' (Namhyung Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 4 Feb 2016 07:57:44 +0000 (08:57 +0100)]
Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 4 Feb 2016 07:56:54 +0000 (08:56 +0100)]
Merge tag 'perf-urgent-for-mingo-2' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fix from Arnaldo Carvalho de Melo:
- Fix 'perf stat' interval output values (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 4 Feb 2016 07:55:00 +0000 (08:55 +0100)]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- tracepoint_error() can receive e=NULL, robustify it, fixes a problem noticed
with a very specific combination: Machine with Intel PT (e.g. Broadwell),
kernel with no perf_event_attr.context_switch feature (e.g. 4.2) and unreadable
tracefs (for instance !root users), making the fallback from
perf_event_attr.context_switch to the sched:sched_switch tracepoint to fail
reading its info from tracefs, fix it. (Adrian Hunter)
- Fix segfault in intel PT, by making it follow the 'struct thread' lifetime cycle
checking expectations, noticed for instance, when processing perf.data files with
Intel PT data using 'perf script' and when exiting 'perf report' (Adrian Hunter)
- Fix CFI usage from .eh_frame and .debug_frame, which sometimes requires that we
fallback from .eh_frame to .debug_frame in architectures such as PowerPC (Hemant Kumar)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Wed, 3 Feb 2016 07:43:56 +0000 (08:43 +0100)]
perf stat: Fix interval output values
We broke interval data displays with commit:
3f416f22d1e2 ("perf stat: Do not clean event's private stats")
This commit removed stats cleaning, which is important for '-r' option
to carry counters data over the whole run. But it's necessary to clean
it for interval mode, otherwise the displayed value is avg of all
previous values.
Before:
$ perf stat -e cycles -a -I 1000 record
# time counts unit events
1.
000240796 75,216,287 cycles
2.
000512791 107,823,524 cycles
$ perf stat report
# time counts unit events
1.
000240796 75,216,287 cycles
2.
000512791 91,519,906 cycles
Now:
$ perf stat report
# time counts unit events
1.
000240796 75,216,287 cycles
2.
000512791 107,823,524 cycles
Notice the second value being bigger (91,.. < 107,..).
This could be easily verified by using perf script which displays raw
stat data:
$ perf script
CPU THREAD VAL ENA RUN TIME EVENT
0 -1
23855779 1000209530 1000209530 1000240796 cycles
1 -1
33340397 1000224964 1000224964 1000240796 cycles
2 -1
15835415 1000226695 1000226695 1000240796 cycles
3 -1 2184696
1000228245 1000228245 1000240796 cycles
0 -1
97014312 2000514533 2000514533 2000512791 cycles
1 -1
46121497 2000543795 2000543795 2000512791 cycles
2 -1
32269530 2000543566 2000543566 2000512791 cycles
3 -1 7634472
2000544108 2000544108 2000512791 cycles
The sum of the first 4 values is the first interval aggregated value:
23855779 +
33340397 +
15835415 + 2184696 = 75,216,287
The sum of the second 4 values minus first value is the second interval
aggregated value:
97014312 +
46121497 +
32269530 + 7634472 -
75216287 = 107,823,524
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1454485436-20639-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Wed, 3 Feb 2016 18:10:02 +0000 (10:10 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"18 fixes"
[ The 18 fixes turned into 17 commits, because one of the fixes was a
fix for another patch in the series that I just folded in by editing
the patch manually - hopefully correctly - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix memory leak in copy_huge_pmd()
drivers/hwspinlock: fix race between radix tree insertion and lookup
radix-tree: fix race in gang lookup
mm/vmpressure.c: fix subtree pressure detection
mm: polish virtual memory accounting
mm: warn about VmData over RLIMIT_DATA
Documentation: cgroup-v2: add memory.stat::sock description
mm: memcontrol: drop superfluous entry in the per-memcg stats array
drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration
proc: revert /proc/<pid>/maps [stack:TID] annotation
numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390
MAINTAINERS: update Seth email
ocfs2/cluster: fix memory leak in o2hb_region_release
lib/test-string_helpers.c: fix and improve string_get_size() tests
thp: limit number of object to scan on deferred_split_scan()
thp: change deferred_split_count() to return number of THP in queue
thp: make split_queue per-node
Linus Torvalds [Wed, 3 Feb 2016 18:04:58 +0000 (10:04 -0800)]
Merge tag 'for-linus-4.5-2' of git://git.code.sf.net/p/openipmi/linux-ipmi
Pull IPMI fix from Corey Minyard:
"Fix a compile error on IPMI when ACPI is disabled"
* tag 'for-linus-4.5-2' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi: put acpi.h with the other headers
Linus Torvalds [Wed, 3 Feb 2016 17:55:50 +0000 (09:55 -0800)]
Merge tag 'devicetree-fixes-for-4.5' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
- Fix build error with *_OF_DECLARE() when used in modules
- Add missing platform maintainers for dts files in MAINTAINERS
* tag 'devicetree-fixes-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: drop symbols declared by _OF_DECLARE() from modules
MAINTAINERS: Add missing platform maintainers for dts files
Linus Torvalds [Wed, 3 Feb 2016 17:36:41 +0000 (09:36 -0800)]
Merge tag 'nfs-for-4.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfix and cleanup from Trond Myklebust:
"Bugfix:
- pNFS: Fix for missing layoutreturn calls
Cleanup:
- pNFS: rename NFS_LAYOUT_RETURN_BEFORE_CLOSE for code clarity"
* tag 'nfs-for-4.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Cleanup - rename NFS_LAYOUT_RETURN_BEFORE_CLOSE
pNFS: Fix missing layoutreturn calls
Linus Torvalds [Wed, 3 Feb 2016 17:31:34 +0000 (09:31 -0800)]
Merge tag 'trace-v4.5-rc1-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"A cleanup to the stack tracer broke stack tracing on s390. Here's a
simple fix to correct that issue"
* tag 'trace-v4.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/stacktrace: Show entire trace if passed in function not found
Hugh Dickins [Sun, 31 Jan 2016 02:03:16 +0000 (18:03 -0800)]
mm: retire GUP WARN_ON_ONCE that outlived its usefulness
Trinity is now hitting the WARN_ON_ONCE we added in v3.15 commit
cda540ace6a1 ("mm: get_user_pages(write,force) refuse to COW in shared
areas"). The warning has served its purpose, nobody was harmed by that
change, so just remove the warning to generate less noise from Trinity.
Which reminds me of the comment I wrongly left behind with that commit
(but was spotted at the time by Kirill), which has since moved into a
separate function, and become even more obscure: delete it.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tony Camuso [Tue, 2 Feb 2016 18:57:24 +0000 (13:57 -0500)]
ipmi: put acpi.h with the other headers
Enclosing '#include <linux/acpi.h>' within '#ifdef CONFIG_ACPI' is
unnecessary, since it has its own conditional compile for CONFIG_ACPI.
Commit
0fbcf4af7c83 ("ipmi: Convert the IPMI SI ACPI handling to a
platform device") exposed this as a problem for platforms that do not
support ACPI when it introduced a call to ACPI_PTR() macro outside of
the CONFIG_ACPI conditional compile. This would have been perfectly
acceptable if acpi.h were not conditionally excluded for the non-acpi
platform, because the conditional compile within acpi.h defines
ACPI_PTR() to return NULL when compiled for non acpi platforms.
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Fixed commit reference in header to conform to standard.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Matthew Wilcox [Wed, 3 Feb 2016 00:57:57 +0000 (16:57 -0800)]
mm: fix memory leak in copy_huge_pmd()
We allocate a pgtable but do not attach it to anything if the PMD is in
a DAX VMA, causing it to leak.
We certainly try to not free pgtables associated with the huge zero page
if the zero page is in a DAX VMA, so I think this is the right solution.
This needs to be properly audited.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Wed, 3 Feb 2016 00:57:55 +0000 (16:57 -0800)]
drivers/hwspinlock: fix race between radix tree insertion and lookup
of_hwspin_lock_get_id() is protected by the RCU lock, which means that
insertions can occur simultaneously with the lookup. If the radix tree
transitions from a height of 0, we can see a slot with the indirect_ptr
bit set, which will cause us to at least read random memory, and could
cause other havoc.
Fix this by using the newly introduced radix_tree_iter_retry().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Wed, 3 Feb 2016 00:57:52 +0000 (16:57 -0800)]
radix-tree: fix race in gang lookup
If the indirect_ptr bit is set on a slot, that indicates we need to redo
the lookup. Introduce a new function radix_tree_iter_retry() which
forces the loop to retry the lookup by setting 'slot' to NULL and
turning the iterator back to point at the problematic entry.
This is a pretty rare problem to hit at the moment; the lookup has to
race with a grow of the radix tree from a height of 0. The consequences
of hitting this race are that gang lookup could return a pointer to a
radix_tree_node instead of a pointer to whatever the user had inserted
in the tree.
Fixes:
cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Wed, 3 Feb 2016 00:57:49 +0000 (16:57 -0800)]
mm/vmpressure.c: fix subtree pressure detection
When vmpressure is called for the entire subtree under pressure we
mistakenly use vmpressure->scanned instead of vmpressure->tree_scanned
when checking if vmpressure work is to be scheduled. This results in
suppressing all vmpressure events in the legacy cgroup hierarchy. Fix it.
Fixes:
8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Wed, 3 Feb 2016 00:57:46 +0000 (16:57 -0800)]
mm: polish virtual memory accounting
* add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture
* always account VMAs with flag VM_STACK as stack (as it was before)
* cleanup classifying helpers
* update comments and documentation
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Wed, 3 Feb 2016 00:57:43 +0000 (16:57 -0800)]
mm: warn about VmData over RLIMIT_DATA
This patch provides a way of working around a slight regression
introduced by commit
84638335900f ("mm: rework virtual memory
accounting").
Before that commit RLIMIT_DATA have control only over size of the brk
region. But that change have caused problems with all existing versions
of valgrind, because it set RLIMIT_DATA to zero.
This patch fixes rlimit check (limit actually in bytes, not pages) and
by default turns it into warning which prints at first VmData misuse:
"mmap: top (795): VmData 516096 exceed data ulimit 512000. Will be forbidden soon."
Behavior is controlled by boot param ignore_rlimit_data=y/n and by sysfs
/sys/module/kernel/parameters/ignore_rlimit_data. For now it set to "y".
[akpm@linux-foundation.org: tweak kernel-parameters.txt text[
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/20151228211015.GL2194@uranus
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kees Cook <keescook@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 3 Feb 2016 00:57:41 +0000 (16:57 -0800)]
Documentation: cgroup-v2: add memory.stat::sock description
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 3 Feb 2016 00:57:38 +0000 (16:57 -0800)]
mm: memcontrol: drop superfluous entry in the per-memcg stats array
MEM_CGROUP_STAT_NSTATS is just a delimiter for cgroup1 statistics, not
an actual array entry. Reuse it for the first cgroup2 stat entry, like
in the event array.
Fixes:
b2807f07f4f8 ("mm: memcontrol: add "sock" to cgroup2 memory.stat")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 3 Feb 2016 00:57:35 +0000 (16:57 -0800)]
drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration
Reduced testcase:
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <numaif.h>
#define SIZE 0x2000
int main()
{
int fd;
void *p;
fd = open("/dev/sg0", O_RDWR);
p = mmap(NULL, SIZE, PROT_EXEC, MAP_PRIVATE | MAP_LOCKED, fd, 0);
mbind(p, SIZE, 0, NULL, 0, MPOL_MF_MOVE);
return 0;
}
We shouldn't try to migrate pages in sg VMA as we don't have a way to
update Sg_scatter_hold::pages accordingly from mm core.
Let's mark the VMA as VM_IO to indicate to mm core that the VMA is not
migratable.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Shiraz Hashim <shashim@codeaurora.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 3 Feb 2016 00:57:29 +0000 (16:57 -0800)]
proc: revert /proc/<pid>/maps [stack:TID] annotation
Commit
b76437579d13 ("procfs: mark thread stack correctly in
proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps.
Finding the task of a stack VMA requires walking the entire thread list,
turning this into quadratic behavior: a thousand threads means a
thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a
million combinations.
The cost is not in proportion to the usefulness as described in the
patch.
Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
/proc/<pid>/numa_maps) usable again for higher thread counts.
The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as
identifying the stack VMA there is an O(1) operation.
Siddesh said:
"The end users needed a way to identify thread stacks programmatically and
there wasn't a way to do that. I'm afraid I no longer remember (or have
access to the resources that would aid my memory since I changed
employers) the details of their requirement. However, I did do this on my
own time because I thought it was an interesting project for me and nobody
really gave any feedback then as to its utility, so as far as I am
concerned you could roll back the main thread maps information since the
information is available in the thread-specific files"
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Holzheu [Wed, 3 Feb 2016 00:57:26 +0000 (16:57 -0800)]
numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390
When working with hugetlbfs ptes (which are actually pmds) is not valid to
directly use pte functions like pte_present() because the hardware bit
layout of pmds and ptes can be different. This is the case on s390.
Therefore we have to convert the hugetlbfs ptes first into a valid pte
encoding with huge_ptep_get().
Currently the /proc/<pid>/numa_maps code uses hugetlbfs ptes without
huge_ptep_get(). On s390 this leads to the following two problems:
1) The pte_present() function returns false (instead of true) for
PROT_NONE hugetlb ptes. Therefore PROT_NONE vmas are missing
completely in the "numa_maps" output.
2) The pte_dirty() function always returns false for all hugetlb ptes.
Therefore these pages are reported as "mapped=xxx" instead of
"dirty=xxx".
Therefore use huge_ptep_get() to correctly convert the hugetlb ptes.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Seth Jennings [Wed, 3 Feb 2016 00:57:23 +0000 (16:57 -0800)]
MAINTAINERS: update Seth email
Update/unify my contact info. The old email address will no longer work
soon.
Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joseph Qi [Wed, 3 Feb 2016 00:57:21 +0000 (16:57 -0800)]
ocfs2/cluster: fix memory leak in o2hb_region_release
o2hb_region_release currently doesn't free o2hb_debug_buf
hr_db_elapsed_time and hr_db_pinned malloced in o2hb_debug_create. Also
we should call debugfs_remove before freeing its data, to prevent the risk
accessing debugfs rightly after its data has been freed.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Kuznetsov [Wed, 3 Feb 2016 00:57:18 +0000 (16:57 -0800)]
lib/test-string_helpers.c: fix and improve string_get_size() tests
Recently added commit
564b026fbd0d ("string_helpers: fix precision loss
for some inputs") fixed precision issues for string_get_size() and broke
tests.
Fix and improve them: test both STRING_UNITS_2 and STRING_UNITS_10 at a
time, better failure reporting, test small an huge values.
Fixes:
564b026fbd0d28e9 ("string_helpers: fix precision loss for some inputs")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 3 Feb 2016 00:57:15 +0000 (16:57 -0800)]
thp: limit number of object to scan on deferred_split_scan()
If we have a lot of pages in queue to be split, deferred_split_scan()
can spend unreasonable amount of time under spinlock with disabled
interrupts.
Let's cap number of pages to split on scan by sc->nr_to_scan.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 3 Feb 2016 00:57:12 +0000 (16:57 -0800)]
thp: change deferred_split_count() to return number of THP in queue
I've got meaning of shrinker::count_objects() wrong: it should return
number of potentially freeable objects, which is not necessary correlate
with freeable memory.
Returning 256 per THP in queue is not reasonable:
shrinker::scan_objects() never called with nr_to_scan > 128 in my setup.
Let's return 1 per THP and correct scan_object accordingly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 3 Feb 2016 00:57:08 +0000 (16:57 -0800)]
thp: make split_queue per-node
Andrea Arcangeli suggested to make split queue per-node to improve
scalability. Let's do it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 3 Feb 2016 14:11:23 +0000 (23:11 +0900)]
perf hists browser: Add 'L' hotkey to change percent limit
Add 'L' key action to change the percent limit applied to both of hist
entries and callchains.
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 3 Feb 2016 14:11:21 +0000 (23:11 +0900)]
perf report: Update documention of --percent-limit option
The --percent-limit option was changed to be applied to callchains as
well as to hist entries recently, but it missed to update the doc.
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 3 Feb 2016 14:11:20 +0000 (23:11 +0900)]
perf report: Update documentation of --sort option
The description of the memory sort key (used by --mem-mode) was
misplaced. Move it under the --sort option so that it can be referenced
properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:24 +0000 (10:24 +0100)]
perf hists: Introduce hists__for_each_sort_list macro
With the hist object having the perf_hpp_list we can now iterate sort
format entries based in the hists object. Adding
hists__for_each_sort_list macro to do that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-27-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:23 +0000 (10:24 +0100)]
perf hists: Introduce hists__for_each_format macro
With the hist object having the perf_hpp_list we can now iterate output
format entries based in the hists object. Adding hists__for_each_format
macro to do that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-26-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:22 +0000 (10:24 +0100)]
perf tools: Add hpp_list into struct hists object
Adding hpp_list into struct hists object.
Initializing struct hists_evsel hists object to carry global
perf_hpp_list list.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-25-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:21 +0000 (10:24 +0100)]
perf hists: Add struct perf_hpp_list argument to helper functions
Adding struct perf_hpp_list argument to following helper functions:
void perf_hpp__setup_output_field(struct perf_hpp_list *list);
void perf_hpp__reset_output_field(struct perf_hpp_list *list);
void perf_hpp__append_sort_keys(struct perf_hpp_list *list);
so they could be used on hists's hpp_list.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-24-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:20 +0000 (10:24 +0100)]
perf hists: Introduce perf_hpp_list__for_each_sort_list_safe macro
Introducing perf_hpp_list__for_each_sort_list_safe macro
to iterate perf_hpp_list object's sort entries safely.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:19 +0000 (10:24 +0100)]
perf hists: Introduce perf_hpp_list__for_each_sort_list macro
Introducing perf_hpp_list__for_each_sort_list macro to iterate
perf_hpp_list object's sort entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-22-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:18 +0000 (10:24 +0100)]
perf hists: Introduce perf_hpp_list__for_each_format_safe macro
Introducing perf_hpp_list__for_each_format_safe macro to iterate
perf_hpp_list object's output entries safely.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-21-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:17 +0000 (10:24 +0100)]
perf hists: Introduce perf_hpp_list__for_each_format macro
Introducing perf_hpp_list__for_each_format macro to iterate
perf_hpp_list object's output entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-20-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:16 +0000 (10:24 +0100)]
perf hists: Pass perf_hpp_list all the way through setup_output_list
Passing perf_hpp_list all the way through setup_output_list so the
output entry could be added on the arbitrary list.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-19-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:14 +0000 (10:24 +0100)]
perf hists: Add perf_hpp_list register helpers
Adding 2 perf_hpp_list register helpers:
perf_hpp_list__column_register()
perf_hpp_list__register_sort_field()
to be called within existing helpers:
perf_hpp__column_register()
perf_hpp__register_sort_field()
to register format entries within global perf_hpp_list object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-17-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:13 +0000 (10:24 +0100)]
perf hists: Introduce perf_hpp_list__init function
Introducing perf_hpp_list__init function to have an easy way to
initialize perf_hpp_list struct.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-16-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:12 +0000 (10:24 +0100)]
perf hists: Introduce struct perf_hpp_list
Gather output and sort lists under struct perf_hpp_list, so we could
have multiple instancies of sort/output format entries.
Replacing current perf_hpp__list and perf_hpp__sort_list lists with
single perf_hpp_list instance.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-15-git-send-email-jolsa@kernel.org
[ Renamed fields to .{fields,sorts} as suggested by Namhyung and acked by Jiri ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:11 +0000 (10:24 +0100)]
perf hists: Separate output fields parsing into setup_output_list function
Separating output fields parsing into setup_output_list function, so
it's separated from field_order string setup and could be reused later
in following patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-14-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:10 +0000 (10:24 +0100)]
perf hists: Separate sort fields parsing into setup_sort_list function
Separating sort fields parsing into setup_sort_list function, so it's
separated from sort_order string setup and could be reused later in
following patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-13-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:09 +0000 (10:24 +0100)]
perf hists: Properly release format fields
With multiple list holding format entries, we need the support properly
releasing format output/sort fields.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:08 +0000 (10:24 +0100)]
perf hists: Remove perf_hpp__column_(disable|enable)
Those functions are no longer needed. They operate over perf_hpp__format
array which is now used only as template for dynamic entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-11-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:07 +0000 (10:24 +0100)]
perf hists: Allocate output sort field
Currently we use static output fields, because we have single global
list of all sort/output fields.
We will add hists specific sort and output lists in following patches,
so we need all format entries to be dynamically allocated. Adding
support to allocate output sort field.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-10-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 18 Jan 2016 09:24:06 +0000 (10:24 +0100)]
perf top: Move UI initialization ahead of sort setup
The ui initialization changes hpp format callbacks, based on the used
browser. Thus we need this init being processed before setup_sorting.
Replica of a patch by Jiri for 'perf report'.
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:06 +0000 (10:24 +0100)]
perf report: Move UI initialization ahead of sort setup
The ui initialization changes hpp format callbacks, based on the used
browser. Thus we need this init being processed before setup_sorting.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:05 +0000 (10:24 +0100)]
perf hists: Make hpp setup function generic
Now that we have the 'equal' method implemented for hpp format entries
we can ease up the logic in the following functions and make them
generic wrt comparing format entries:
perf_hpp__setup_output_field
perf_hpp__append_sort_keys
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-8-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:04 +0000 (10:24 +0100)]
perf hists: Add 'hpp__equal' callback function
Adding 'hpp__equal' callback function to compare hpp output format
entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:03 +0000 (10:24 +0100)]
perf hists: Add 'equal' method to perf_hpp_fmt struct
To easily compare format entries and make it available for all kinds of
format entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:02 +0000 (10:24 +0100)]
perf hists: Use struct perf_hpp_fmt::idx in perf_hpp__reset_width
We are going to add dynamic hpp format fields, so we need to make the
'len' change for the format itself, not in the perf_hpp__format
template.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:01 +0000 (10:24 +0100)]
perf hists: Add _idx fields into struct perf_hpp_fmt
Currently there's no way of comparing hpp format entries, which is
needed in following patches.
Adding _idx fields into struct perf_hpp_fmt to recognize and be able to
compare hpp format entries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:24:00 +0000 (10:24 +0100)]
perf hists: Introduce perf_evsel__output_resort function
Adding evsel specific function to sort hists_evsel based hists. The
hists__output_resort can be now used to sort common hists object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 18 Jan 2016 09:23:59 +0000 (10:23 +0100)]
perf hists: Factor output_resort from hists__output_resort
Currently hists__output_resort() depends on hists based on hists_evsel
struct, but we need to be able to sort common hists as well.
Cutting out the sorting base sorting code into output_resort
function, so it can be reused in following patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1453109064-1026-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Wed, 3 Feb 2016 10:02:37 +0000 (11:02 +0100)]
Merge tag 'perf-core-for-mingo-3' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core callchain fixes and improvements from Arnaldo Carvalho de Melo <acme@redhat.com:
User visible changes:
- Make --percent-limit apply to callchains also and fix some bugs
related to --percent-limit (Namhyung Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 3 Feb 2016 10:00:29 +0000 (11:00 +0100)]
Merge tag 'perf-core-for-mingo-2' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf tooling changes from Arnaldo Carvalho de Melo:
New features:
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
Infrastructure changes:
- Use the 'feature-dump' target to do the feature checks just once and then
add code to reuse that in the tests/make makefile, speeding up the
'make -C tools/perf build-test' target (Wang Nan)
- Reduce the number of tests the 'build-test' target do to those that don't
pollute the source tree (Arnaldo Carvalho de Melo)
- Improve the output of the build tests a bit by aligning the name of the
tests, more can be done to filter out uninteresting info in the output
(Arnaldo Carvalho de Melo)
- Add perf_evlist pointer to *info_priv_size(), more prep work for
supporting the coresight architecture (Mathieu Poirier)
- Improve the 'perf test bp_signal' test (Wang Nan)
- Check environment before starting the BPF 'perf test', so that we can just
'Skip' older kernels instead of 'FAIL'ing them (Wang Nan)
- Fix cpumode of synthesized buildid event (Wang Nan)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 3 Feb 2016 09:58:46 +0000 (10:58 +0100)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Rename the "colors.code" ~/.perfconfig variable to "colors.jump_arrows",
as it controls just the that UI element in the annotate browser (Taeung Song)
- Avoid trying to read ELF symtabs from device files, noticed while doing
memory profiling work (Jiri Olsa)
- Improve context detection when offering options in the hists browser,
i.e. some options don't make sense when the browser is not working with
a perf.data file ('perf top' mode), only in 'perf report' mode, like
scripting (Namhyung Kim)
Infrastructure changes:
- Elliminate duplication in the hists browser filter functions, getting the
common part into a function that receives callbacks for filtering by
DSO, thread, etc. (Namhyung Kim)
- Fix misleadingly indented assignment, found using
gcc6 -Wmisleading-indentation (Markus Trippelsdorf)
- Handle LLVM relocation oddities in libbpf, introducing a 'perf test' that
detects such problems and then fixing the problem, so that the test now
passes (Wang Nan)
- More improvements to the build infrastructure to allow reusing the
feature detection facilities (Wang Nan)
- Auto initialize the globals needed by cpu__max_{cpu,node}() routines
(Arnaldo Carvalho de Melo)
Documentation changes:
- Document the perf sysctls in Documentation/sysctl/kernel.txt (Ben Hutchings)
- Document a bunch more ~/.perfconfig knobs (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hemant Kumar [Tue, 2 Feb 2016 15:26:46 +0000 (20:56 +0530)]
perf probe: Search both .eh_frame and .debug_frame sections for probe location
'perf probe' through debuginfo__find_probes() in util/probe-finder.c
checks for the functions' frame descriptions in either .eh_frame section
of an ELF or the .debug_frame.
The check is based on whether either one of these sections is present.
Depending on distro, toolchain defaults, architetcutre, build flags,
etc., CFI might be found in either .eh_frame and/or .debug_frame.
Sometimes, it may happen that, .eh_frame, even if present, may not be
complete and may miss some descriptions.
Therefore, to be sure, to find the CFI covering an address we will
always have to investigate both if available.
For e.g., in powerpc, this may happen:
$ gcc -g bin.c -o bin
$ objdump --dwarf ./bin
<1><145>: Abbrev Number: 7 (DW_TAG_subprogram)
<146> DW_AT_external : 1
<146> DW_AT_name : (indirect string, offset: 0x9e): main
<14a> DW_AT_decl_file : 1
<14b> DW_AT_decl_line : 39
<14c> DW_AT_prototyped : 1
<14c> DW_AT_type : <0x57>
<150> DW_AT_low_pc : 0x100007b8
If the .eh_frame and .debug_frame are checked for the same binary, we
will find that, .eh_frame (although present) doesn't contain a
description for "main" function.
But, .debug_frame has a description:
000000d8 00000024 00000000 FDE cie=
00000000 pc=
100007b8..
10000838
DW_CFA_advance_loc: 16 to
100007c8
DW_CFA_def_cfa_offset: 144
DW_CFA_offset_extended_sf: r65 at cfa+16
...
Due to this (since, perf checks whether .eh_frame is present and goes on
searching for that address inside that frame), perf is unable to process
the probes:
# perf probe -x ./bin main
Failed to get call frame on 0x100007b8
Error: Failed to add events.
To avoid this issue, we need to check both the sections (.eh_frame and
.debug_frame), which is done in this patch.
Note that, we can always force everything into both .eh_frame and
.debug_frame by:
$ gcc bin.c -fasynchronous-unwind-tables -fno-dwarf2-cfi-asm -g -o bin
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mark Wielaard <mjw@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1454426806-13974-1-git-send-email-hemant@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 1 Feb 2016 03:21:04 +0000 (03:21 +0000)]
perf tools: Fix thread lifetime related segfaut in intel_pt
intel_pt_process_auxtrace_info() creates a pt->unknown_thread thread
that eventually needs to be freed by the last thread__put() on it, when
its refcount hits zero, which may happen in
intel_pt_process_auxtrace_info() error handling path and triggers the
following segfault, which would happen as well at intel_pt_free, when
tools using this intel_pt codebase frees up resources:
# perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
0 a anaconda-ks.cfg bin perf.data perf.data.old perf-f23-bringup.todo
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.217 MB perf.data ]
#
# perf script -F event,comm,pid,tid,time,addr,ip,sym,dso,iregs
Samples for 'instructions:u' event do not have IREGS attribute set. Cannot print 'iregs' field.
intel_pt_synth_events: failed to synthesize 'instructions' event type
Segmentation fault (core dumped)
#
The problem is: there's a union in 'struct thread' combines a list_head
and a rb_node. The standard life cycle of a thread is: init rb_node in
the constructor, insert it into machine->threads rbtree using rb_node,
move it to machine->dead_threads using list_head, clean in the last
thread__put: list_del_init(&thread->node).
In the above command, it clean a thread before adding it into list,
causes the above segfault.
Since pt->unknown_thread will never live in an rbtree, initialize its
list node so that when list_del_init() is done on it we don't segfault.
After this patch:
# perf script -F event,comm,pid,tid,time,addr,ip,sym,dso,iregs
Samples for 'instructions:u' event do not have IREGS attribute set. Cannot print 'iregs' field.
intel_pt_synth_events: failed to synthesize 'instructions' event type
0x248 [0x88]: failed to process type: 70
#
Reported-by: Tong Zhang <ztong@vt.edu>
Reported-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/1454296865-19749-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Mon, 1 Feb 2016 23:56:08 +0000 (15:56 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"This looks like a lot but it's a mixture of regression fixes as well
as fixes for longer standing issues.
1) Fix on-channel cancellation in mac80211, from Johannes Berg.
2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
module, from Eric Dumazet.
3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
Dumazet.
4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
bound, from Craig Gallek.
5) GRO key comparisons don't take lightweight tunnels into account,
from Jesse Gross.
6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
Dumazet.
7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
register them, otherwise the NEWLINK netlink message is missing
the proper attributes. From Thadeu Lima de Souza Cascardo.
8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
Schimmel
9) Handle fragments properly in ipv4 easly socket demux, from Eric
Dumazet.
10) Don't ignore the ifindex key specifier on ipv6 output route
lookups, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
tcp: avoid cwnd undo after receiving ECN
irda: fix a potential use-after-free in ircomm_param_request
net: tg3: avoid uninitialized variable warning
net: nb8800: avoid uninitialized variable warning
net: vxge: avoid unused function warnings
net: bgmac: clarify CONFIG_BCMA dependency
net: hp100: remove unnecessary #ifdefs
net: davinci_cpdma: use dma_addr_t for DMA address
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
netlink: not trim skb for mmaped socket when dump
vxlan: fix a out of bounds access in __vxlan_find_mac
net: dsa: mv88e6xxx: fix port VLAN maps
fib_trie: Fix shift by 32 in fib_table_lookup
net: moxart: use correct accessors for DMA memory
ipv4: ipconfig: avoid unused ic_proto_used symbol
bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
bnxt_en: Ring free response from close path should use completion ring
net_sched: drr: check for NULL pointer in drr_dequeue
...
Linus Torvalds [Mon, 1 Feb 2016 23:49:18 +0000 (15:49 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
API:
- algif_hash needs to wait for init operations to complete.
- The has_key setting for shash was always true.
Algorithms:
- Add missing selections of CRYPTO_HASH.
- Fix pkcs7 authentication.
Drivers:
- Fix stack alignment bug in chacha20-ssse3.
- Fix performance regression in caam due to incorrect setting.
- Fix potential compile-only build failure of stm32"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: atmel-aes - remove calls of clk_prepare() from atomic contexts
crypto: algif_hash - wait for crypto_ahash_init() to complete
crypto: shash - Fix has_key setting
hwrng: stm32 - Fix dependencies for !HAS_IOMEM archs
crypto: ghash,poly1305 - select CRYPTO_HASH where needed
crypto: chacha20-ssse3 - Align stack pointer to 64 bytes
PKCS#7: Don't require SpcSpOpusInfo in Authenticode pkcs7 signatures
crypto: caam - make write transactions bufferable on PPC platforms
Linus Torvalds [Mon, 1 Feb 2016 23:21:20 +0000 (15:21 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"1/ Fixes to the libnvdimm 'pfn' device that establishes a reserved
area for storing a struct page array.
2/ Fixes for dax operations on a raw block device to prevent pagecache
collisions with dax mappings.
3/ A fix for pfn_t usage in vm_insert_mixed that lead to a null
pointer de-reference.
These have received build success notification from the kbuild robot
across 153 configs and pass the latest ndctl tests"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
phys_to_pfn_t: use phys_addr_t
mm: fix pfn_t to page conversion in vm_insert_mixed
block: use DAX for partition table reads
block: revert runtime dax control of the raw block device
fs, block: force direct-I/O for dax-enabled block devices
devm_memremap_pages: fix vmem_altmap lifetime + alignment handling
libnvdimm, pfn: fix restoring memmap location
libnvdimm: fix mode determination for e820 devices
Namhyung Kim [Thu, 28 Jan 2016 12:24:54 +0000 (21:24 +0900)]
perf report: Don't show blank lines if entry has no callchain
When all callchains of a hist entry is percent-limited, do not add a
blank line at the end. It makes the entry look like it doesn't have
callchains.
Reported-and-Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160128122454.GA27446@danjae.kornet
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:56 +0000 (00:40 +0900)]
perf hists browser: Fix percent display in callchains
When there's only a single callchain, perf doesn't print its percentage
in front of the symbols. This is because it assumes that the percentage
is same as parents. But if a percent limit is applied, it's possible
that there are actually a couple of child nodes but only one of them is
shown. In this case it should display the percent to prevent
misunderstanding of its percentage is same as the parent's.
For example, let's see the following callchain.
$ perf report --no-children --percent-limit 0.01 --tui
...
- 0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- 0.04% mmap_region
do_mmap_pgoff
- vm_mmap_pgoff
+ 0.02% sys_mmap_pgoff
+ 0.02% vm_mmap
+ 0.02% mprotect_fixup
Current code omits the percent if 'mmap_region' becomes the only node
when percent limit is set to 0.03%, its percent is not 0.06% but users
will assume it incorrectly.
Before:
$ perf report --no-children --percent-limit 0.03 --tui
...
0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- mmap_region
do_mmap_pgoff
vm_mmap_pgoff
After:
$ perf report --no-children --percent-limit 0.03 --tui
...
0.06% sleep [kernel.vmlinux] [k] kmem_cache_alloc_trace
kmem_cache_alloc_trace
- perf_event_mmap
- 0.04% mmap_region
do_mmap_pgoff
vm_mmap_pgoff
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-10-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:55 +0000 (00:40 +0900)]
perf hists browser: Pass parent_total to callchain print functions
Pass parent node's total period to callchain print functions. This info
is needed by later patch to determine whether it can omit percent or not
correctly.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-9-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:54 +0000 (00:40 +0900)]
perf hists browser: Fix dump to show correct callchain style
The commit
8c430a348699 ("perf hists browser: Support folded
callchains") missed to update hist_browser__dump() so it always shows
graph-style callchains regardless of current setting.
To fix that, factor out callchain printing code and rename the existing
function which prints graph-style callchain.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes:
8c430a348699 ("perf hists browser: Support folded callchains")
Link: http://lkml.kernel.org/r/1453909257-26015-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:53 +0000 (00:40 +0900)]
perf report: Fix percent display in callchains on --stdio
When there's only a single callchain, perf doesn't print its percentage
in front of the symbols. This is because it assumes that the percentage
is same as parents. But if a percent limit is applied, it's possible
that there are actually a couple of child nodes but only one of them is
shown. In this case it should display the percent to prevent
misunderstanding of its percentage is same as the parent's.
For example, let's see the following callchain.
$ perf report -s comm --percent-limit 0.01 --stdio
...
9.95% swapper
|
|--7.57%--intel_idle
| cpuidle_enter_state
| cpuidle_enter
| call_cpuidle
| cpu_startup_entry
| |
| |--4.89%--start_secondary
| |
| --2.68%--rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
|
|--0.15%--__schedule
| |
| |--0.13%--schedule
| | schedule_preempt_disable
| | cpu_startup_entry
| | |
| | |--0.09%--start_secondary
| | |
| | --0.04%--rest_init
| | start_kernel
| | x86_64_start_reservations
| | x86_64_start_kernel
| |
| --0.01%--schedule_preempt_disabled
| cpu_startup_entry
...
Current code omits the percent if 'intel_idle' becomes the only node
when percent limit is set to 0.5%, its percent is not 9.95% but users
will assume it incorrectly.
Before:
$ perf report --percent-limit 0.5 --stdio
...
9.95% swapper
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--4.89%--start_secondary
|
--2.68%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
After:
$ perf report --percent-limit 0.5 --stdio
...
9.95% swapper
|
--7.57%--intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--4.89%--start_secondary
|
--2.68%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-7-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:52 +0000 (00:40 +0900)]
perf callchain: Pass parent_samples to __callchain__fprintf_graph()
Pass hist entry's period to graph callchain print function. This info
is needed by later patch to determine whether it can omit percentage of
top-level node or not.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:51 +0000 (00:40 +0900)]
perf report: Get rid of hist_entry__callchain_fprintf()
It's just a wrapper function to align the start position ofcallchains to
'comm' of each thread if it's a first sort key. But it doesn't not work
with tracepoint events and also with upcoming hierarchy view.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:50 +0000 (00:40 +0900)]
perf report: Apply --percent-limit to callchains also
Currently --percent-limit option only works for hist entries. However
it'd be better to have same effect to callchains as well
Requested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:49 +0000 (00:40 +0900)]
perf hists: Update hists' total period when adding entries
Currently the hist entry addition path doesn't update total_period of
hists and it's calculated during 'resort' path. But the resort path
needs to know the total period before doing its job because it's used
for calculating percent limit of callchains in hist entries.
So this patch update the total period during the addition path. It
makes the percent limit of callchains working (again).
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 27 Jan 2016 15:40:48 +0000 (00:40 +0900)]
perf hists: Fix min callchain hits calculation
The total period should be get using hists__total_period() since it
takes filtered entries into account. In addition, if callchain mode is
'fractal', the total period should be the entry's period.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1453909257-26015-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Tue, 26 Jan 2016 12:05:20 +0000 (14:05 +0200)]
perf tools: tracepoint_error() can receive e=NULL, robustify it
Fixes segmentation fault using, for instance:
(gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
(gdb) bt
#0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
#1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
at util/parse-events.c:433
#2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
at util/parse-events.c:498
#3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
at util/parse-events.c:936
#4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
#5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
#6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
#7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
#8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
#9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
#10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
#11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
#12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
#13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
#14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
#15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)
Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints. Fix by checking before using.
Committer note:
This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.
Further info from a similar patch by Wang:
The error is in tracepoint_error: it assumes the 'e' parameter is valid.
However, there are many situation a parse_event() can be called without
parse_events_error. See result of
$ grep 'parse_events(.*NULL)' ./tools/perf/ -r'
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tong Zhang <ztong@vt.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes:
196581717d85 ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Mon, 1 Feb 2016 02:12:16 +0000 (18:12 -0800)]
Linux 4.5-rc2
Linus Torvalds [Mon, 1 Feb 2016 01:36:45 +0000 (17:36 -0800)]
Merge tag 'usb-4.5-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB fixes and new device ids for 4.5-rc2. Nothing
major here, full details are in the shortlog, and all of these have
been in linux-next successfully"
* tag 'usb-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: option: fix Cinterion AHxx enumeration
USB: mxu11x0: fix memory leak on usb_serial private data
USB: serial: ftdi_sio: add support for Yaesu SCU-18 cable
USB: serial: option: Adding support for Telit LE922
USB: serial: visor: fix crash on detecting device without write_urbs
USB: visor: fix null-deref at probe
USB: cp210x: add ID for IAI USB to RS485 adaptor
usb: hub: do not clear BOS field during reset device
cdc-acm:exclude Samsung phone 04e8:685d
usb: cdc-acm: send zero packet for intel 7260 modem
usb: cdc-acm: handle unlinked urb in acm read callback
Linus Torvalds [Mon, 1 Feb 2016 01:09:39 +0000 (17:09 -0800)]
Merge tag 'tty-4.5-rc2' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial driver fixes for 4.5-rc2.
They resolve a number of reported problems (the ioctl one specifically
has been pointed out by numerous people) and one patch adds some new
device ids for the 8250_pci driver. All have been in linux-next
successfully"
* tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_pci: Add Intel Broadwell ports
staging/speakup: Use tty_ldisc_ref() for paste kworker
n_tty: Fix unsafe reference to "other" ldisc
tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
tty: Retry failed reopen if tty teardown in-progress
tty: Wait interruptibly for tty lock on reopen
Linus Torvalds [Mon, 1 Feb 2016 01:00:27 +0000 (17:00 -0800)]
Merge tag 'staging-4.5-rc2' of git://git./linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are some small staging driver fixes for 4.5-rc2.
One of them predated 4.4-final, but I missed that merge window due to
the holliday. The others fix reported issues that have come up
recently. The tty change is needed for the speakup driver fix and has
the ack of the tty driver maintainer as well, i.e. myself :)
All have been in linux-next with no reported issues"
* tag 'staging-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Staging: speakup: fix read scrolled-back VT
Staging: speakup: Fix getting port information
Revert "Staging: panel: usleep_range is preferred over udelay"
iio: adis_buffer: Fix out-of-bounds memory access
Linus Torvalds [Mon, 1 Feb 2016 00:55:04 +0000 (16:55 -0800)]
Merge tag 'driver-core-4.5-rc2' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here's a single driver core fix that resolves an issue a lot of users
have been hitting for a while now. It's been tested a lot and has
been in linux-next successfully for a while"
* tag 'driver-core-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
base/platform: Fix platform drivers with no probe callback
Linus Torvalds [Mon, 1 Feb 2016 00:50:31 +0000 (16:50 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fix from Ralf Baechle:
"Just a single revert for a patch which I had upstreamed out of
sequence"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
Revert "MIPS: bcm63xx: nvram: Remove unused bcm63xx_nvram_get_psi_size() function"
Linus Torvalds [Mon, 1 Feb 2016 00:17:19 +0000 (16:17 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A bit on the largish side due to a series of fixes for a regression in
the x86 vector management which was introduced in 4.3. This work was
started in December already, but it took some time to fix all corner
cases and a couple of older bugs in that area which were detected
while at it
Aside of that a few platform updates for intel-mid, quark and UV and
two fixes for in the mm code:
- Use proper types for pgprot values to avoid truncation
- Prevent a size truncation in the pageattr code when setting page
attributes for large mappings"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/mm/pat: Avoid truncation when converting cpa->numpages to address
x86/mm: Fix types used in pgprot cacheability flags translations
x86/platform/quark: Print boundaries correctly
x86/platform/UV: Remove EFI memmap quirk for UV2+
x86/platform/intel-mid: Join string and fix SoC name
x86/platform/intel-mid: Enable 64-bit build
x86/irq: Plug vector cleanup race
x86/irq: Call irq_force_move_complete with irq descriptor
x86/irq: Remove outgoing CPU from vector cleanup mask
x86/irq: Remove the cpumask allocation from send_cleanup_vector()
x86/irq: Clear move_in_progress before sending cleanup IPI
x86/irq: Remove offline cpus from vector cleanup
x86/irq: Get rid of code duplication
x86/irq: Copy vectormask instead of an AND operation
x86/irq: Check vector allocation early
x86/irq: Reorganize the search in assign_irq_vector
x86/irq: Reorganize the return path in assign_irq_vector
x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
x86/irq: Validate that irq descriptor is still active
x86/irq: Fix a race in x86_vector_free_irqs()
...
Linus Torvalds [Sun, 31 Jan 2016 23:49:06 +0000 (15:49 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"The timer departement delivers:
- a regression fix for the NTP code along with a proper selftest
- prevent a spurious timer interrupt in the NOHZ lowres code
- a fix for user space interfaces returning the remaining time on
architectures with CONFIG_TIME_LOW_RES=y
- a few patches to fix COMPILE_TEST fallout"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/nohz: Set the correct expiry when switching to nohz/lowres mode
clocksource: Fix dependencies for archs w/o HAS_IOMEM
clocksource: Select CLKSRC_MMIO where needed
tick/sched: Hide unused oneshot timer code
kselftests: timers: Add adjtimex SETOFFSET validity tests
ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
hrtimer: Handle remaining time proper for TIME_LOW_RES
clockevents/tcb_clksrc: Prevent disabling an already disabled clock
Linus Torvalds [Sun, 31 Jan 2016 23:44:04 +0000 (15:44 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Three small fixes in the scheduler/core:
- use after free in the numa code
- crash in the numa init code
- a simple spelling fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
pid: Fix spelling in comments
sched/numa: Fix use-after-free bug in the task_numa_compare
sched: Fix crash in sched_init_numa()
Linus Torvalds [Sun, 31 Jan 2016 23:38:27 +0000 (15:38 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"This is much bigger than typical fixes, but Peter found a category of
races that spurred more fixes and more debugging enhancements. Work
started before the merge window, but got finished only now.
Aside of that this contains the usual small fixes to perf and tools.
Nothing particular exciting"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
perf: Remove/simplify lockdep annotation
perf: Synchronously clean up child events
perf: Untangle 'owner' confusion
perf: Add flags argument to perf_remove_from_context()
perf: Clean up sync_child_event()
perf: Robustify event->owner usage and SMP ordering
perf: Fix STATE_EXIT usage
perf: Update locking order
perf: Remove __free_event()
perf/bpf: Convert perf_event_array to use struct file
perf: Fix NULL deref
perf/x86: De-obfuscate code
perf/x86: Fix uninitialized value usage
perf: Fix race in perf_event_exit_task_context()
perf: Fix orphan hole
perf stat: Do not clean event's private stats
perf hists: Fix HISTC_MEM_DCACHELINE width setting
perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
perf tests: Remove wrong semicolon in while loop in CQM test
perf: Synchronously free aux pages in case of allocation failure
...
Linus Torvalds [Sun, 31 Jan 2016 23:29:37 +0000 (15:29 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A single commit, which makes the rtmutex.wait_lock an irq safe lock.
This prevents a potential deadlock which can be triggered by the rcu
boosting code from rcu_read_unlock()"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Make wait_lock irq safe
Linus Torvalds [Sun, 31 Jan 2016 22:48:58 +0000 (14:48 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull IRQ fixes from Ingo Molnar:
"Mostly irqchip driver fixes, but also an irq core crash fix and a
build fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mxs: Add missing set_handle_irq()
irqchip/atmel-aic: Fix wrong bit operation for IRQ priority
irqchip/gic-v3-its: Recompute the number of pages on page size change
base: Export platform_msi_domain_[alloc,free]_irqs
of: MSI: Simplify irqdomain lookup
irqdomain: Allow domain lookup with DOMAIN_BUS_WIRED token
irqchip: Fix dependencies for archs w/o HAS_IOMEM
irqchip/s3c24xx: Mark init_eint as __maybe_unused
genirq: Validate action before dereferencing it in handle_irq_event_percpu()
Linus Torvalds [Sun, 31 Jan 2016 22:43:09 +0000 (14:43 -0800)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull debugobjects fix from Ingo Molnar:
"Bump up debugobjects pool limit that bigger s390 systems kept running
into"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Allow bigger number of early boot objects
Linus Torvalds [Sun, 31 Jan 2016 22:38:37 +0000 (14:38 -0800)]
Merge tag 'vfio-v4.5-rc2' of git://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
"Use alternate group tracking for no-iommu"
* tag 'vfio-v4.5-rc2' of git://github.com/awilliam/linux-vfio:
vfio/noiommu: Don't use iommu_present() to track fake groups
Linus Torvalds [Sun, 31 Jan 2016 22:29:52 +0000 (14:29 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here are two I2C driver regression fixes. piix4 gets a larger
overhaul fixing the latest refactoring and also an older known issue
as well. designware-pci gets a fix for a bad merge conflict
resolution"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: piix4: don't regress on bus names
i2c: designware-pci: use IRQF_COND_SUSPEND flag
i2c: piix4: Fully initialize SB800 before it is registered
i2c: piix4: Fix SB800 locking
Dan Williams [Fri, 22 Jan 2016 17:43:28 +0000 (09:43 -0800)]
phys_to_pfn_t: use phys_addr_t
A dma_addr_t is potentially smaller than a phys_addr_t on some archs.
Don't truncate the address when doing the pfn conversion.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Matthew Wilcox <willy@linux.intel.com>
[willy: fix pfn_t_to_phys as well]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 26 Jan 2016 17:48:05 +0000 (09:48 -0800)]
mm: fix pfn_t to page conversion in vm_insert_mixed
pfn_t_to_page() honors the flags in the pfn_t value to determine if a
pfn is backed by a page. However, vm_insert_mixed() was originally
written to use pfn_valid() to make this determination. To restore the
old/correct behavior, ignore the pfn_t flags in the !pfn_t_devmap() case
and fallback to trusting pfn_valid().
Fixes:
01c8f1c44b83 ("mm, dax, gpu: convert vm_insert_mixed to pfn_t")
Cc: Dave Hansen <dave@sr71.net>
Cc: David Airlie <airlied@linux.ie>
Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tested-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
David S. Miller [Sat, 30 Jan 2016 23:32:42 +0000 (15:32 -0800)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2016-01-30
Here's a set of important Bluetooth fixes for the 4.5 kernel:
- Two fixes to 6LoWPAN code (one fixing a potential crash)
- Fix LE pairing with devices using both public and random addresses
- Fix allocation of dynamic LE PSM values
- Fix missing COMPATIBLE_IOCTL for UART line discipline
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Williams [Fri, 29 Jan 2016 04:25:31 +0000 (20:25 -0800)]
block: use DAX for partition table reads
Avoid populating pagecache when the block device is in DAX mode.
Otherwise these page cache entries collide with the fsync/msync
implementation and break data durability guarantees.
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 29 Jan 2016 04:13:39 +0000 (20:13 -0800)]
block: revert runtime dax control of the raw block device
Dynamically enabling DAX requires that the page cache first be flushed
and invalidated. This must occur atomically with the change of DAX mode
otherwise we confuse the fsync/msync tracking and violate data
durability guarantees. Eliminate the possibilty of DAX-disabled to
DAX-enabled transitions for now and revisit this for the next cycle.
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 26 Jan 2016 01:23:18 +0000 (17:23 -0800)]
fs, block: force direct-I/O for dax-enabled block devices
Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
to a block-device special file. Both regular files and device special
files can use the common filp->f_mapping->host lookup to determing is
DAX is enabled.
Otherwise, we confuse the DAX code that does not expect to find live
data in the page cache:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
__delete_from_page_cache+0x9f6/0xb60()
Modules linked in:
CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<
ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
[<
ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
[<
ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
[<
ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
[<
ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
[<
ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
[<
ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
[< inline >] wp_pfn_shared mm/memory.c:2208
[<
ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
[< inline >] handle_pte_fault mm/memory.c:3323
[< inline >] __handle_mm_fault mm/memory.c:3417
[<
ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
[<
ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
[<
ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
[<
ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
[<
ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
[<
ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
---[ end trace
dae21e0f85f1f98c ]---
Fixes:
5a023cdba50c ("block: enable dax for raw block devices")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>