Linus Torvalds [Sun, 4 Feb 2018 00:25:42 +0000 (16:25 -0800)]
Merge tag 'usercopy-v4.16-rc1' of git://git./linux/kernel/git/kees/linux
Pull hardened usercopy whitelisting from Kees Cook:
"Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs.
To further restrict what memory is available for copying, this creates
a way to whitelist specific areas of a given slab cache object for
copying to/from userspace, allowing much finer granularity of access
control.
Slab caches that are never exposed to userspace can declare no
whitelist for their objects, thereby keeping them unavailable to
userspace via dynamic copy operations. (Note, an implicit form of
whitelisting is the use of constant sizes in usercopy operations and
get_user()/put_user(); these bypass all hardened usercopy checks since
these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over
the next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage"
* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
lkdtm: Update usercopy tests for whitelisting
usercopy: Restrict non-usercopy caches to size 0
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
kvm: whitelist struct kvm_vcpu_arch
arm: Implement thread_struct whitelist for hardened usercopy
arm64: Implement thread_struct whitelist for hardened usercopy
x86: Implement thread_struct whitelist for hardened usercopy
fork: Provide usercopy whitelisting for task_struct
fork: Define usercopy region in thread_stack slab caches
fork: Define usercopy region in mm_struct slab caches
net: Restrict unwhitelisted proto caches to size 0
sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
sctp: Define usercopy region in SCTP proto slab cache
caif: Define usercopy region in caif proto slab cache
ip: Define usercopy region in IP proto slab cache
net: Define usercopy region in struct proto slab cache
scsi: Define usercopy region in scsi_sense_cache slab cache
cifs: Define usercopy region in cifs_request slab cache
vxfs: Define usercopy region in vxfs_inode slab cache
ufs: Define usercopy region in ufs_inode_cache slab cache
...
Linus Torvalds [Sat, 3 Feb 2018 21:55:01 +0000 (13:55 -0800)]
Merge tag 'pstore-v4.16-rc1' of git://git./linux/kernel/git/kees/linux
Pull pstore update from Kees Cook:
"Only a header cleanup this release; nice and quiet. :)
- clean up hardirq header usage (Yang Shi)"
* tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fs: pstore: remove unused hardirq.h
Linus Torvalds [Sat, 3 Feb 2018 21:49:22 +0000 (13:49 -0800)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Only miscellaneous cleanups and bug fixes for ext4 this cycle"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: create ext4_kset dynamically
ext4: create ext4_feat kobject dynamically
ext4: release kobject/kset even when init/register fail
ext4: fix incorrect indentation of if statement
ext4: correct documentation for grpid mount option
ext4: use 'sbi' instead of 'EXT4_SB(sb)'
ext4: save error to disk in __ext4_grp_locked_error()
jbd2: fix sphinx kernel-doc build warnings
ext4: fix a race in the ext4 shutdown path
mbcache: make sure c_entry_count is not decremented past zero
ext4: no need flush workqueue before destroying it
ext4: fixed alignment and minor code cleanup in ext4.h
ext4: fix ENOSPC handling in DAX page fault handler
dax: pass detailed error code from dax_iomap_fault()
mbcache: revert "fs/mbcache.c: make count_objects() more robust"
mbcache: initialize entry->e_referenced in mb_cache_entry_create()
ext4: fix up remaining files with SPDX cleanups
Linus Torvalds [Sat, 3 Feb 2018 21:46:14 +0000 (13:46 -0800)]
Merge branch 'dmi-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull dmi subsystem updates/fixes from Jean Delvare.
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi: handle missing DMI data gracefully
firmware: dmi_scan: Fix handling of empty DMI strings
firmware: dmi_scan: Drop dmi_initialized
firmware: dmi: Optimize dmi_matches
Linus Torvalds [Sat, 3 Feb 2018 21:44:29 +0000 (13:44 -0800)]
Merge branch 'fixes-v4.16-rc1' of git://git./linux/kernel/git/jmorris/linux-security
Pull integrity fixes from James Morris:
- add James Bottommley as a Trusted Keys maintainer.
- IMA: re-initialize iint->atomic_flags on iint_free(), from Mimi.
* 'fixes-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
ima: re-initialize iint->atomic_flags
maintainers: update trusted keys
Linus Torvalds [Sat, 3 Feb 2018 21:16:55 +0000 (13:16 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) The bnx2x can hang if you give it a GSO packet with a segment size
which is too big for the hardware, detect and drop in this case.
From Daniel Axtens.
2) Fix some overflows and pointer leaks in xtables, from Dmitry Vyukov.
3) Missing RCU locking in igmp, from Eric Dumazet.
4) Fix RX checksum handling on r8152, it can only checksum UDP and TCP
packets. From Hayes Wang.
5) Minor pacing tweak to TCP BBR congestion control, from Neal
Cardwell.
6) Missing RCU annotations in cls_u32, from Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
Revert "defer call to mem_cgroup_sk_alloc()"
soreuseport: fix mem leak in reuseport_add_sock()
net: qlge: use memmove instead of skb_copy_to_linear_data
net: qed: use correct strncpy() size
net: cxgb4: avoid memcpy beyond end of source buffer
cls_u32: add missing RCU annotation.
r8152: set rx mode early when linking on
r8152: fix wrong checksum status for received IPv4 packets
nfp: fix TLV offset calculation
net: pxa168_eth: add netconsole support
net: igmp: add a missing rcu locking section
ibmvnic: fix firmware version when no firmware level has been provided by the VIOS server
vmxnet3: remove redundant initialization of pointer 'rq'
lan78xx: remove redundant initialization of pointer 'phydev'
net: jme: remove unused initialization of 'rxdesc'
rtnetlink: remove check for IFLA_IF_NETNSID
rocker: fix possible null pointer dereference in rocker_router_fib_event_work
inet: Avoid unitialized variable warning in inet_unhash()
net: bridge: Fix uninitialized error in br_fdb_sync_static()
openvswitch: Remove padding from packet before L3+ conntrack processing
...
Linus Torvalds [Sat, 3 Feb 2018 21:14:41 +0000 (13:14 -0800)]
Merge tag 'gfs2-4.16.fixes2' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 fixes from Bob Peterson:
"Andreas Gruenbacher wrote two additional patches that we would like
merged in this time. Both are regressions:
- fix another kernel build dependency problem
- fix a performance regression in glock dumps"
* tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Glock dump performance regression fix
gfs2: Fix the crc32c dependency
Linus Torvalds [Sat, 3 Feb 2018 21:07:56 +0000 (13:07 -0800)]
Merge tag 'scsi-postmerge' of git://git./linux/kernel/git/jejb/scsi
Pull second set of SCSI updates from James Bottomley:
"This is a set of three patches that depended on mq and zone changes in
the block tree (now upstream)"
* tag 'scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Remove zone write locking
scsi: sd_zbc: Initialize device request queue zoned data
scsi: scsi-mq-debugfs: Show more information
Linus Torvalds [Sat, 3 Feb 2018 21:01:19 +0000 (13:01 -0800)]
Merge tag 'linux-kselftest-4.16-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
"This update to Kselftest consists of fixes, cleanups, and SPDX license
additions"
* tag 'linux-kselftest-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: vm: update .gitignore with missing generated file
selftests/x86: Add <test_name>{,_32,_64} targets
selftests: Fix loss of test output in run_kselftests.sh
selftest: ftrace: Fix to add 256 kprobe events correctly
selftest: ftrace: Fix to pick text symbols for kprobes
selftests: media_tests: Add SPDX license identifier
selftests: kselftest.h: Add SPDX license identifier
selftests: kselftest_install.sh: Add SPDX license identifier
selftests: gen_kselftest_tar.h: Add SPDX license identifier
selftests: media_tests: Fix Makefile 'clean' target warning
tools/testing: Fix trailing semicolon
kselftest: fix OOM in memory compaction test
selftests: seccomp: fix compile error seccomp_bpf
Linus Torvalds [Sat, 3 Feb 2018 00:44:14 +0000 (16:44 -0800)]
pinctrl: remove include file from <linux/device.h>
When pulling the recent pinctrl merge, I was surprised by how a
pinctrl-only pull request ended up rebuilding basically the whole
kernel.
The reason for that ended up being that <linux/device.h> included
<linux/pinctrl/devinfo.h>, so any change to that file ended up causing
pretty much every driver out there to be rebuilt.
The reason for that was because 'struct device' has this in it:
#ifdef CONFIG_PINCTRL
struct dev_pin_info *pins;
#endif
but we already avoid header includes for these kinds of things in that
header file, preferring to just use a forward-declaration of the
structure instead. Exactly to avoid this kind of header dependency.
Since some drivers seem to expect that <linux/pinctrl/devinfo.h> header
to come in automatically, move the include to <linux/pinctrl/pinctrl.h>
instead. It might be better to just make the includes more targeted,
but I'm not going to review every driver.
It would definitely be good to have a tool for finding and minimizing
header dependencies automatically - or at least help with them. Right
now we almost certainly end up having way too many of these things, and
it's hard to test every single configuration.
FWIW, you can get a sense of the "hotness" of a header file with something
like this after doing a full build:
find . -name '.*.o.cmd' -print0 |
xargs -0 tail --lines=+2 |
grep -v 'wildcard ' |
tr ' \\' '\n' |
sort | uniq -c | sort -n | less -S
which isn't exact (there are other things in those '*.o.cmd' than just
the dependencies, and the "--lines=+2" only removes the header), but
might a useful approximation.
With this patch, <linux/pinctrl/devinfo.h> drops to "only" having 833
users in the current x86-64 allmodconfig. In contrast, <linux/device.h>
has 14857 build files including it directly or indirectly.
Of course, the headers that absolutely _everybody_ includes (things like
<linux/types.h> etc) get a score of 23000+.
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ard Biesheuvel [Sat, 3 Feb 2018 10:25:20 +0000 (11:25 +0100)]
firmware: dmi: handle missing DMI data gracefully
Currently, when booting a kernel with DMI support on a platform that has
no DMI tables, the following output is emitted into the kernel log:
[ 0.128818] DMI not present or invalid.
...
[ 1.306659] dmi: Firmware registration failed.
...
[ 2.908681] dmi-sysfs: dmi entry is absent.
The first one is a pr_info(), but the subsequent ones are pr_err()s that
complain about a condition that is not really an error to begin with.
So let's clean this up, and give up silently if dma_available is not set.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Hundebøll <mnhu@prevas.dk>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Jean Delvare [Sat, 3 Feb 2018 10:25:20 +0000 (11:25 +0100)]
firmware: dmi_scan: Fix handling of empty DMI strings
The handling of empty DMI strings looks quite broken to me:
* Strings from 1 to 7 spaces are not considered empty.
* True empty DMI strings (string index set to 0) are not considered
empty, and result in allocating a 0-char string.
* Strings with invalid index also result in allocating a 0-char
string.
* Strings starting with 8 spaces are all considered empty, even if
non-space characters follow (sounds like a weird thing to do, but
I have actually seen occurrences of this in DMI tables before.)
* Strings which are considered empty are reported as 8 spaces,
instead of being actually empty.
Some of these issues are the result of an off-by-one error in memcmp,
the rest is incorrect by design.
So let's get it square: missing strings and strings made of only
spaces, regardless of their length, should be treated as empty and
no memory should be allocated for them. All other strings are
non-empty and should be allocated.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 79da4721117f ("x86: fix DMI out of memory problems")
Cc: Parag Warudkar <parag.warudkar@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Jean Delvare [Sat, 3 Feb 2018 10:25:20 +0000 (11:25 +0100)]
firmware: dmi_scan: Drop dmi_initialized
I don't think it makes sense to check for a possible bad
initialization order at run time on every system when it is all
decided at build time.
A more efficient way to make sure developers do not introduce new
calls to dmi_check_system() too early in the initialization sequence
is to simply document the expected call order. That way, developers
have a chance to get it right immediately, without having to
test-boot their kernel, wonder why it does not work, and parse the
kernel logs for a warning message. And we get rid of the run-time
performance penalty as a nice side effect.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Jean Delvare [Sat, 3 Feb 2018 10:25:20 +0000 (11:25 +0100)]
firmware: dmi: Optimize dmi_matches
Function dmi_matches can me made a bit faster:
* The documented purpose of dmi_initialized is to catch too early
calls to dmi_check_system(). I'm not fully convinced it justifies
slowing down the initialization of all systems out there, but at
least the check should not have been moved from dmi_check_system()
to dmi_matches(). dmi_matches() is being called for every entry of
the table passed to dmi_check_system(), causing the same redundant
check to be performed again and again. So move it back to
dmi_check_system(), reverting this specific portion of commit
d7b1956fed33 ("DMI: Introduce dmi_first_match to make the interface
more flexible").
* Don't check for the exact_match flag again when we already know its
value.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: d7b1956fed33 ("DMI: Introduce dmi_first_match to make the interface more flexible")
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Jeff Garzik <jgarzik@redhat.com>
Roman Gushchin [Fri, 2 Feb 2018 15:26:57 +0000 (15:26 +0000)]
Revert "defer call to mem_cgroup_sk_alloc()"
This patch effectively reverts commit
9f1c2674b328 ("net: memcontrol:
defer call to mem_cgroup_sk_alloc()").
Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
memcg socket memory accounting, as packets received before memcg
pointer initialization are not accounted and are causing refcounting
underflow on socket release.
Actually the free-after-use problem was fixed by
commit
c0576e397508 ("net: call cgroup_sk_alloc() earlier in
sk_clone_lock()") for the cgroup pointer.
So, let's revert it and call mem_cgroup_sk_alloc() just before
cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
we're cloning, and it holds a reference to the memcg.
Also, let's drop BUG_ON(mem_cgroup_is_root()) check from
mem_cgroup_sk_alloc(). I see no reasons why bumping the root
memcg counter is a good reason to panic, and there are no realistic
ways to hit it.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 2 Feb 2018 18:27:27 +0000 (10:27 -0800)]
soreuseport: fix mem leak in reuseport_add_sock()
reuseport_add_sock() needs to deal with attaching a socket having
its own sk_reuseport_cb, after a prior
setsockopt(SO_ATTACH_REUSEPORT_?BPF)
Without this fix, not only a WARN_ONCE() was issued, but we were also
leaking memory.
Thanks to sysbot and Eric Biggers for providing us nice C repros.
------------[ cut here ]------------
socket already in reuseport group
WARNING: CPU: 0 PID: 3496 at net/core/sock_reuseport.c:119
reuseport_add_sock+0x742/0x9b0 net/core/sock_reuseport.c:117
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 3496 Comm: syzkaller869503 Not tainted 4.15.0-rc6+ #245
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
panic+0x1e4/0x41c kernel/panic.c:183
__warn+0x1dc/0x200 kernel/panic.c:547
report_bug+0x211/0x2d0 lib/bug.c:184
fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079
Fixes: ef456144da8e ("soreuseport: define reuseport groups")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+c0ea2226f77a42936bf7@syzkaller.appspotmail.com
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 2 Feb 2018 15:45:44 +0000 (16:45 +0100)]
net: qlge: use memmove instead of skb_copy_to_linear_data
gcc-8 points out that the skb_copy_to_linear_data() argument points to
the skb itself, which makes it run into a problem with overlapping
memcpy arguments:
In file included from include/linux/ip.h:20,
from drivers/net/ethernet/qlogic/qlge/qlge_main.c:26:
drivers/net/ethernet/qlogic/qlge/qlge_main.c: In function 'ql_realign_skb':
include/linux/skbuff.h:3378:2: error: 'memcpy' source argument is the same as destination [-Werror=restrict]
memcpy(skb->data, from, len);
It's unclear to me what the best solution is, maybe it ought to use a
different helper that adjusts the skb data in a safe way. Simply using
memmove() here seems like the easiest workaround.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 2 Feb 2018 15:44:47 +0000 (16:44 +0100)]
net: qed: use correct strncpy() size
passing the strlen() of the source string as the destination
length is pointless, and gcc-8 now warns about it:
drivers/net/ethernet/qlogic/qed/qed_debug.c: In function 'qed_grc_dump':
include/linux/string.h:253: error: 'strncpy' specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
This changes qed_grc_dump_big_ram() to instead uses the length of
the destination buffer, and use strscpy() to guarantee nul-termination.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 2 Feb 2018 15:18:37 +0000 (16:18 +0100)]
net: cxgb4: avoid memcpy beyond end of source buffer
Building with link-time-optimizations revealed that the cxgb4 driver does
a fixed-size memcpy() from a variable-length constant string into the
network interface name:
In function 'memcpy',
inlined from 'cfg_queues_uld.constprop' at drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:335:2,
inlined from 'cxgb4_register_uld.constprop' at drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:719:9:
include/linux/string.h:350:3: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
__read_overflow2();
^
I can see two equally workable solutions: either we use a strncpy() instead
of the memcpy() to stop at the end of the input, or we make the source buffer
fixed length as well. This implements the latter.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 2 Feb 2018 15:02:22 +0000 (16:02 +0100)]
cls_u32: add missing RCU annotation.
In a couple of points of the control path, n->ht_down is currently
accessed without the required RCU annotation. The accesses are
safe, but sparse complaints. Since we already held the
rtnl lock, let use rtnl_dereference().
Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
Fixes: de5df63228fc ("net: sched: cls_u32 changes to knode must appear atomic to readers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 3 Feb 2018 00:19:00 +0000 (19:19 -0500)]
Merge branch 'r8152-fix-rx-issues'
Hayes Wang says:
====================
r8152: fix rx issues
The two patched are used to fix rx issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Fri, 2 Feb 2018 08:43:36 +0000 (16:43 +0800)]
r8152: set rx mode early when linking on
Set rx mode before calling netif_wake_queue() when linking on to avoid
the device missing the receiving packets.
The transmission may start after calling netif_wake_queue(), and the
packets of resopnse may reach before calling rtl8152_set_rx_mode()
which let the device could receive packets. Then, the packets of
response would be missed.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Fri, 2 Feb 2018 08:43:35 +0000 (16:43 +0800)]
r8152: fix wrong checksum status for received IPv4 packets
The device could only check the checksum of TCP and UDP packets. Therefore,
for the IPv4 packets excluding TCP and UDP, the check of checksum is necessary,
even though the IP checksum is correct.
Take ICMP for example, The IP checksum may be correct, but the ICMP checksum
may be wrong.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Fri, 2 Feb 2018 03:41:43 +0000 (19:41 -0800)]
nfp: fix TLV offset calculation
The data pointer in the config space TLV parser already includes
NFP_NET_CFG_TLV_BASE, it should not be added again. Incorrect
offset values were only used in printed user output, rendering
the bug merely cosmetic.
Fixes: 73a0329b057e ("nfp: add TLV capabilities to the BAR")
Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Feb 2018 22:57:44 +0000 (14:57 -0800)]
Merge tag 'firewire-updates' of git://git./linux/kernel/git/ieee1394/linux1394
Pull firewire updates from Stefan Richter
- make JMicron JMB38x controllers work with IOMMU-equipped systems
- IP-over-1394: allow user-configured MTU of up to 4096 bytes
* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire-ohci: work around oversized DMA reads on JMicron controllers
firewire: net: max MTU off by one
Linus Torvalds [Fri, 2 Feb 2018 22:22:53 +0000 (14:22 -0800)]
Merge tag 'pinctrl-v4.16-1' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"This is the bulk of pin control changes for the v4.16 kernel cycle.
Like with GPIO it is actually a bit calm this time.
Core changes:
- After lengthy discussions and partly due to my ignorance, we have
merged a patch making pinctrl_force_default() and
pinctrl_force_sleep() reprogram the states into the hardware of any
hogged pins, even if they are already in the desired state.
This only apply to hogged pins since groups of pins owned by
drivers need to be managed by each driver, lest they could not do
things like runtime PM and put pins to sleeping state even if the
system as a whole is not in sleep.
New drivers:
- New driver for the Microsemi Ocelot SoC. This is used in ethernet
switches.
- The X-Powers AXP209 GPIO driver was extended to also deal with pin
control and moved over from the GPIO subsystem. This circuit is a
mixed-mode integrated circuit which is part of AllWinner designs.
- New subdriver for the Qualcomm MSM8998 SoC, core of a high end
mobile devices (phones) chipset.
- New subdriver for the ST Microelectronics STM32MP157 MPU and
STM32F769 MCU from the STM32 family.
- New subdriver for the MediaTek MT7622 SoC. This is used for
routers, repeater, gateways and such network infrastructure.
- New subdriver for the NXP (former Freescale) i.MX 6ULL. This SoC
has multimedia features and target "smart devices", I guess in-car
entertainment, in-flight entertainment, industrial control panels
etc.
General improvements:
- Incremental improvements on the SH-PFC subdrivers for things like
the CAN bus.
- Enable the glitch filter on Baytrail GPIOs used for interrupts.
- Proper handling of pins to GPIO ranges on the Semtec SX150X
- An IRQ setup ordering fix on MCP23S08.
- A good set of janitorial coding style fixes"
* tag 'pinctrl-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (102 commits)
pinctrl: mcp23s08: fix irq setup order
pinctrl: Forward declare struct device
pinctrl: sunxi: Use of_clk_get_parent_count() instead of open coding
pinctrl: stm32: add STM32F769 MCU support
pinctrl: sx150x: Add a static gpio/pinctrl pin range mapping
pinctrl: sx150x: Register pinctrl before adding the gpiochip
pinctrl: sx150x: Unregister the pinctrl on release
pinctrl: ingenic: Remove redundant dev_err call in ingenic_pinctrl_probe()
pinctrl: sprd: Use seq_putc() in sprd_pinconf_group_dbg_show()
pinctrl: pinmux: Use seq_putc() in pinmux_pins_show()
pinctrl: abx500: Use seq_putc() in abx500_gpio_dbg_show()
pinctrl: mediatek: mt7622: align error handling of mtk_hw_get_value call
pinctrl: mediatek: mt7622: fix potential uninitialized value being returned
pinctrl: uniphier: refactor drive strength get/set functions
pinctrl: imx7ulp: constify struct imx_cfg_params_decode
pinctrl: imx: constify struct imx_pinctrl_soc_info
pinctrl: imx7d: simplify imx7d_pinctrl_probe
pinctrl: imx: use struct imx_pinctrl_soc_info as a const
pinctrl: sunxi-pinctrl: fix pin funtion can not be match correctly.
pinctrl: qcom: Add msm8998 pinctrl driver
...
Linus Torvalds [Fri, 2 Feb 2018 22:19:19 +0000 (14:19 -0800)]
Merge tag 'rtc-4.16' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Not much this cycle. I've pushed the at32ap700x removal late but it is
unlikely to cause any issues.
Summary:
Subsystem:
- Move ABI documentation to Documentation/ABI
New driver:
- NXP i.MX53 SRTC
- Chrome OS EC RTC
Drivers:
- Remove at32ap700x
- Many fixes in various error paths"
* tag 'rtc-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: remove rtc-at32ap700x
Documentation: rtc: move iotcl interface documentation to ABI
Documentation: rtc: add sysfs file permissions
Documentation: rtc: move sysfs documentation to ABI
rtc: mxc_v2: remove __exit annotation
rtc: mxc_v2: Remove unnecessary platform_get_resource() error check
rtc: add mxc driver for i.MX53 SRTC
dt-bindings: rtc: add bindings for i.MX53 SRTC
rtc: r7301: Fix a possible sleep-in-atomic bug in rtc7301_set_time
rtc: r7301: Fix a possible sleep-in-atomic bug in rtc7301_read_time
rtc: omap: fix unbalanced clk_prepare_enable/clk_disable_unprepare
rtc: ac100: Fix multiple race conditions
rtc: sun6i: ensure rtc is kfree'd on error
rtc: cros-ec: add cros-ec-rtc driver.
mfd: cros_ec: Introduce RTC commands and events definitions.
rtc: stm32: Fix copyright
rtc: Remove unused RTC_DEVICE_NAME_SIZE
rtc: r9701: Remove r9701_remove function
rtc: brcmstb-waketimer: fix error handling in brcmstb_waketmr_probe()
Linus Torvalds [Fri, 2 Feb 2018 21:46:21 +0000 (13:46 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mattst88/alpha
Pull alpha updates from Matt Turner:
"A few small fixes and clean ups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha: fix crash if pthread_create races with signal delivery
alpha: fix formating of stack content
alpha: fix reboot on Avanti platform
alpha: deprecate pci_get_bus_and_slot()
alpha: Fix mixed up args in EXC macro in futex operations
alpha: osf_sys.c: use timespec64 where appropriate
alpha: osf_sys.c: fix put_tv32 regression
alpha: make thread_saved_pc static
alpha: make XTABS equivalent to TAB3
Linus Torvalds [Fri, 2 Feb 2018 18:01:04 +0000 (10:01 -0800)]
Merge tag 'powerpc-4.16-1' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Enable support for memory protection keys aka "pkeys" on Power7/8/9
when using the hash table MMU.
- Extend our interrupt soft masking to support masking PMU interrupts
as well as "normal" interrupts, and then use that to implement
local_t for a ~4x speedup vs the current atomics-based
implementation.
- A new driver "ocxl" for "Open Coherent Accelerator Processor
Interface (OpenCAPI)" devices.
- Support for new device tree properties on PowerVM to describe
hotpluggable memory and devices.
- Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit
VDSO.
- Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI
erratum workaround, plus a minor cleanup patch.
As well as quite a lot of other changes all over the place, and small
fixes and cleanups as always.
Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy,
Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V,
Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann,
Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G.
Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur,
David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic
Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva,
Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh
Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal,
Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael
Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud,
Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee,
Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann,
Vaibhav Jain, Vasyl Gomonovych"
* tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits)
powerpc/mm/radix: Fix build error when RADIX_MMU=n
macintosh/ams-input: Use true and false for boolean values
macintosh: change some data types from int to bool
powerpc/watchdog: Print the NIP in soft_nmi_interrupt()
powerpc/watchdog: regs can't be null in soft_nmi_interrupt()
powerpc/watchdog: Tweak watchdog printks
powerpc/cell: Remove axonram driver
rtc-opal: Fix handling of firmware error codes, prevent busy loops
powerpc/mpc52xx_gpt: make use of raw_spinlock variants
macintosh/adb: Properly mark continued kernel messages
powerpc/pseries: Fix cpu hotplug crash with memoryless nodes
powerpc/numa: Ensure nodes initialized for hotplug
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
powerpc/kernel: Block interrupts when updating TIDR
powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
powerpc/mm/nohash: do not flush the entire mm when range is a single page
powerpc/pseries: Add Initialization of VF Bars
powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV
powerpc/eeh: Add EEH notify resume sysfs
powerpc/eeh: Add EEH operations to notify resume
...
Linus Torvalds [Fri, 2 Feb 2018 17:50:51 +0000 (09:50 -0800)]
Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
- StrongARM SA1111 updates to modernise and remove cruft
- Add StrongARM gpio drivers for board GPIOs
- Verify size of zImage is what we expect to avoid issues with
appended DTB
- nommu updates from Vladimir Murzin
- page table read-write-execute checking from Jinbum Park
- Broadcom Brahma-B15 cache updates from Florian Fainelli
- Avoid failure with kprobes test caused by inappropriately
placed kprobes
- Remove __memzero optimisation (which was incorrectly being
used directly by some drivers)
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (32 commits)
ARM: 8745/1: get rid of __memzero()
ARM: 8744/1: don't discard memblock for kexec
ARM: 8743/1: bL_switcher: add MODULE_LICENSE tag
ARM: 8742/1: Always use REFCOUNT_FULL
ARM: 8741/1: B15: fix unused label warnings
ARM: 8740/1: NOMMU: Make sure we do not hold stale data in mem[] array
ARM: 8739/1: NOMMU: Setup VBAR/Hivecs for secondaries cores
ARM: 8738/1: Disable CONFIG_DEBUG_VIRTUAL for NOMMU
ARM: 8737/1: mm: dump: add checking for writable and executable
ARM: 8736/1: mm: dump: make the page table dumping seq_file
ARM: 8735/1: mm: dump: make page table dumping reusable
ARM: sa1100/neponset: add GPIO drivers for control and modem registers
ARM: sa1100/assabet: add BCR/BSR GPIO driver
ARM: 8734/1: mm: idmap: Mark variables as ro_after_init
ARM: 8733/1: hw_breakpoint: Mark variables as __ro_after_init
ARM: 8732/1: NOMMU: Allow userspace to access background MPU region
ARM: 8727/1: MAINTAINERS: Update brcmstb entries to cover B15 code
ARM: 8728/1: B15: Register reboot notifier for KEXEC
ARM: 8730/1: B15: Add suspend/resume hooks
ARM: 8726/1: B15: Add CPU hotplug awareness
...
Linus Torvalds [Fri, 2 Feb 2018 17:48:36 +0000 (09:48 -0800)]
Merge tag 'microblaze-4.16-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull microblaze updates from Michal Simek:
- Fix endian handling and Kconfig dependency
- Fix iounmap prototype
* tag 'microblaze-4.16-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Setup proper dependency for optimized lib functions
microblaze: fix iounmap prototype
microblaze: fix endian handling
Mimi Zohar [Tue, 23 Jan 2018 15:00:41 +0000 (10:00 -0500)]
ima: re-initialize iint->atomic_flags
Intermittently security.ima is not being written for new files. This
patch re-initializes the new slab iint->atomic_flags field before
freeing it.
Fixes: commit 0d73a55208e9 ("ima: re-introduce own integrity cache lock")
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Mimi Zohar [Thu, 1 Feb 2018 03:14:36 +0000 (22:14 -0500)]
maintainers: update trusted keys
Adding James Bottomley as the new maintainer for trusted keys.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Linus Torvalds [Fri, 2 Feb 2018 01:48:47 +0000 (17:48 -0800)]
Merge tag 'drm-for-v4.16' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This seems to have been a comparatively quieter merge window, I assume
due to holidays etc. The "biggest" change is AMD header cleanups, which
merge/remove a bunch of them. The AMD gpu scheduler is now being made generic
with the etnaviv driver wanting to reuse the code, hopefully other drivers
can go in the same direction.
Otherwise it's the usual lots of stuff in i915/amdgpu, not so much stuff
elsewhere.
Core:
- Add .last_close and .output_poll_changed helpers to reduce driver footprints
- Fix plane clipping
- Improved debug printing support
- Add panel orientation property
- Update edid derived properties at edid setting
- Reduction in fbdev driver footprint
- Move amdgpu scheduler into core for other drivers to use.
i915:
- Selftest and IGT improvements
- Fast boot prep work on IPS, pipe config
- HW workarounds for Cannonlake, Geminilake
- Cannonlake clock and HDMI2.0 fixes
- GPU cache invalidation and context switch improvements
- Display planes cleanup
- New PMU interface for perf queries
- New firmware support for KBL/SKL
- Geminilake HW workaround for perforamce
- Coffeelake stolen memory improvements
- GPU reset robustness work
- Cannonlake horizontal plane flipping
- GVT work
amdgpu/radeon:
- RV and Vega header file cleanups (lots of lines gone!)
- TTM operation context support
- 48-bit GPUVM support for Vega/RV
- ECC support for Vega
- Resizeable BAR support
- Multi-display sync support
- Enable swapout for reserved BOs during allocation
- S3 fixes on Raven
- GPU reset cleanup and fixes
- 2+1 level GPU page table
amdkfd:
- GFX7/8 SDMA user queues support
- Hardware scheduling for multiple processes
- dGPU prep work
rcar:
- Added R8A7743/5 support
- System suspend/resume support
sun4i:
- Multi-plane support for YUV formats
- A83T and LVDS support
msm:
- Devfreq support for GPU
tegra:
- Prep work for adding Tegra186 support
- Tegra186 HDMI support
- HDMI2.0 and zpos support by using generic helpers
tilcdc:
- Misc fixes
omapdrm:
- Support memory bandwidth limits
- DSI command mode panel cleanups
- DMM error handling
exynos:
- drop the old IPP subdriver.
etnaviv:
- Occlusion query fixes
- Job handling fixes
- Prep work for hooking in gpu scheduler
armada:
- Move closer to atomic modesetting
- Allow disabling primary plane if overlay is full screen
imx:
- Format modifier support
- Add tile prefetch to PRE
- Runtime PM support for PRG
ast:
- fix LUT loading"
* tag 'drm-for-v4.16' of git://people.freedesktop.org/~airlied/linux: (1471 commits)
drm/ast: Load lut in crtc_commit
drm: Check for lessee in DROP_MASTER ioctl
drm: fix gpu scheduler link order
drm/amd/display: Demote error print to debug print when ATOM impl missing
dma-buf: fix reservation_object_wait_timeout_rcu once more v2
drm/amdgpu: Avoid leaking PM domain on driver unbind (v2)
drm/amd/amdgpu: Add Polaris version check
drm/amdgpu: Reenable manual GPU reset from sysfs
drm/amdgpu: disable MMHUB power gating on raven
drm/ttm: Don't unreserve swapped BOs that were previously reserved
drm/ttm: Don't add swapped BOs to swap-LRU list
drm/amdgpu: only check for ECC on Vega10
drm/amd/powerplay: Fix smu_table_entry.handle type
drm/ttm: add VADDR_FLAG_UPDATED_COUNT to correctly update dma_page global count
drm: Fix PANEL_ORIENTATION_QUIRKS breaking the Kconfig DRM menuconfig
drm/radeon: fill in rb backend map on evergreen/ni.
drm/amdgpu/gfx9: fix ngg enablement to clear gds reserved memory (v2)
drm/ttm: only free pages rather than update global memory count together
drm/amdgpu: fix CPU based VM updates
drm/amdgpu: fix typo in amdgpu_vce_validate_bo
...
Linus Torvalds [Fri, 2 Feb 2018 00:56:07 +0000 (16:56 -0800)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The core framework has a handful of patches this time around, mostly
due to the clk rate protection support added by Jerome Brunet.
This feature will allow consumers to lock in a certain rate on the
output of a clk so that things like audio playback don't hear pops
when the clk frequency changes due to shared parent clks changing
rates. Currently the clk API doesn't guarantee the rate of a clk stays
at the rate you request after clk_set_rate() is called, so this new
API will allow drivers to express that requirement.
Beyond this, the core got some debugfs pretty printing patches and a
couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h
file causes small one-liners in all the architecture Kbuild files.
Overall, the driver diff seems to be the normal stuff that comes all
the time to fix little problems here and there and to support new
hardware.
Summary:
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
clk: aspeed: Handle inverse polarity of USB port 1 clock gate
clk: aspeed: Fix return value check in aspeed_cc_init()
clk: aspeed: Add reset controller
clk: aspeed: Register gated clocks
clk: aspeed: Add platform driver and register PLLs
clk: aspeed: Register core clocks
clk: Add clock driver for ASPEED BMC SoCs
clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built
clk: fix reentrancy of clk_enable() on UP systems
clk: meson-axg: fix potential NULL dereference in axg_clkc_probe()
clk: Simplify debugfs registration
clk: Fix debugfs_create_*() usage
clk: Show symbolic clock flags in debugfs
clk: renesas: r8a7796: Add FDP clock
clk: Move __clk_{get,put}() into private clk.h API
clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks
clk: Improve flags doc for of_clk_detect_critical()
arch: Remove clkdev.h asm-generic from Kbuild
clk: sunxi-ng: a83t: Add M divider to TCON1 clock
clk: Prepare to remove asm-generic/clkdev.h
...
Linus Torvalds [Fri, 2 Feb 2018 00:35:31 +0000 (16:35 -0800)]
Merge tag 'armsoc-drivers' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Arnd Bergmann:
"A number of new drivers get added this time, along with many
low-priority bugfixes. The most interesting changes by subsystem are:
bus drivers:
- Updates to the Broadcom bus interface driver to support newer SoC
types
- The TI OMAP sysc driver now supports updated DT bindings
memory controllers:
- A new driver for Tegra186 gets added
- A new driver for the ti-emif sram, to allow relocating
suspend/resume handlers there
SoC specific:
- A new driver for Qualcomm QMI, the interface to the modem on MSM
SoCs
- A new driver for power domains on the actions S700 SoC
- A driver for the Xilinx Zynq VCU logicoreIP
reset controllers:
- A new driver for Amlogic Meson-AGX
- various bug fixes
tee subsystem:
- A new user interface got added to enable asynchronous communication
with the TEE supplicant.
- A new method of using user space memory for communication with the
TEE is added"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (84 commits)
of: platform: fix OF node refcount leak
soc: fsl: guts: Add a NULL check for devm_kasprintf()
bus: ti-sysc: Fix smartreflex sysc mask
psci: add CPU_IDLE dependency
soc: xilinx: Fix Kconfig alignment
soc: xilinx: xlnx_vcu: Use bitwise & rather than logical && on clkoutdiv
soc: xilinx: xlnx_vcu: Depends on HAS_IOMEM for xlnx_vcu
soc: bcm: brcmstb: Be multi-platform compatible
soc: brcmstb: biuctrl: exit without warning on non brcmstb platforms
Revert "soc: brcmstb: Only register SoC device on STB platforms"
bus: omap: add MODULE_LICENSE tags
soc: brcmstb: Only register SoC device on STB platforms
tee: shm: Potential NULL dereference calling tee_shm_register()
soc: xilinx: xlnx_vcu: Add Xilinx ZYNQMP VCU logicoreIP init driver
dt-bindings: soc: xilinx: Add DT bindings to xlnx_vcu driver
soc: xilinx: Create folder structure for soc specific drivers
of: platform: populate /firmware/ node from of_platform_default_populate_init()
soc: samsung: Add SPDX license identifiers
soc: qcom: smp2p: Use common error handling code in qcom_smp2p_probe()
tee: shm: don't put_page on null shm->pages
...
Linus Torvalds [Fri, 2 Feb 2018 00:17:40 +0000 (16:17 -0800)]
Merge tag 'armsoc-soc' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC platform updates from Arnd Bergmann:
"These are mostly minor bugfixes, cleanup and many defconfig updates to
support added drivers. In particular OMAP and PXA keep cleaning up the
legacy code base, as usual.
Nvidia adds some more SoC support code for Tegra 186.
For the first time on years, we are actually adding a non-DT platform
for the EP93xx based Liebherr controller BK3.1. It's a minor variation
of the EP93xx reference design and in active use, while EP93xx
apparently doesn't have enough new development to have any device tree
support"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (73 commits)
ARM: omap: hwmod: fix section mismatch warnings
ARM: pxa/tosa-bt: add MODULE_LICENSE tag
arm64: defconfig: enable CONFIG_ACPI_APEI_EINJ
arm64: defconfig: enable EDAC GHES option
arm64: defconfig: enable CONFIG_ACPI_APEI_MEMORY_FAILURE
ARM: imx_v6_v7_defconfig: enable CONFIG_CPU_FREQ_STAT
Wind down ARM/TANGO port
ARM: davinci: constify gpio_led
ARM: davinci: drop unneeded newline
soc: Add SoC driver for Gemini
ARM: SAMSUNG: Add SPDX license identifiers
ARM: S5PV210: Add SPDX license identifiers
ARM: S3C64XX: Add SPDX license identifiers
ARM: S3C24XX: Add SPDX license identifiers
ARM: EXYNOS: Add SPDX license identifiers
ARM: imx: remove unused imx3 pm definitions
ARM: imx: don't abort MMDC probe if power saving status doesn't match
ARM: imx_v6_v7_defconfig: enable RTC_DRV_MXC_V2
ARM: imx_v6_v7_defconfig: Add missing config for DART-MX6 SoM
ARM: davinci: Use PTR_ERR_OR_ZERO()
...
Linus Torvalds [Fri, 2 Feb 2018 00:07:54 +0000 (16:07 -0800)]
Merge tag 'armsoc-dt' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC device tree updates from Arnd Bergmann:
"We get a moderate number of new machines this time, and only one new
SoC variant (Actions S700):
Actions:
- S700 Soc and CubieBoard7 development board
- Allo.com Sparky Single-board-computer
Allwinner:
- Orange Pi R1 development board
- Libre Computer Board ALL-H3-CC H3 single-board computer
ASpeed ast2x00:
- Witherspoon: OpenPower Power9 server manufactured by IBM that uses the ASPEED ast2500
- Zaius: OpenPower Power9 server manufactured by Invatech that uses the ASPEED ast2500
- Q71L: Intel Xeon server manufactured by Qanta that uses the ASPEED ast2400
AT91:
- Axentia Nattis/Natte digital signage
- sama5d2 PTC-ek Evaluation board
Freescale/NXP i.MX:
- SolidRun Humminboard2 development board
- Variscite DART-MX6 SoM and Carrier-board
- Technologic TS-4600 and TS-7970 development board
- Toradex Colibri iMX7D SoM board
- v1.5 variant of Solidrun Cubox-i and Hummingboard
Freescale/NXP Layerscape:
- Moxa UC-8410A Series industrial computer
Gemini:
- D-Link DNS-313 NAS enclosure
OMAP:
- LogicPD OMAP35xx SOM-LV devkit
- LogicPD OMAP35xx Torpedo devkit
Renesas:
- r8a77970 (V3M) Starter Kit board
- r8a7795 (M3-W) Salvator-XS board
We finally managed to get the dtc warnings under control, with no more
build-time warnings for bad device tree files. This includes fixes for
the majority of platforms, including nomadik, samsung, lpc32xx, STi,
spear, mediatek, freescale, qcom, realview, keystone, omap, kirkwood,
renesas, hisilicon, and broadcom.
Files get rearranged on a few platforms, in particular the Marvell
Armada 7K/8K device tree files are changed in preparation for future
SoC support, based on more than two of the same chips in one package,
and some boards get renamed for oxnas for consistency.
Finally, many existing SoCs gain descriptions for additional on-chip
devices that we can now support with kernel drivers:
- Allwinner A83t (drm, ethernet, i2c, ...), H3/H5 (USB-OTG)
- Amlogic AXG family (clk, pinctrl, pwm, ...), and others (vpu, hdmi)
- Aspeed clk controller support
- Freescale LS1088A, LS1021A device support
- Gemini Ethernet, PCI, TVE, panel
- Keystone gpio, qspi, more uarts
- Mediatek cpufreq, regulator, clock, reset
- Marvell thermal, cpufreq, nand
- Renesas SMP, thermal, timer, PWM, sound, phy, ipmmu
- Rockchip Mipi, GPU, display
- Samsung Exynos5433 PMU, power domain, nfc
- Spreadtrum: sc9860 clocks
- Tegra TX2 PSDI, HDMI, I2C,SMMU, display, fuse, ..."
* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (690 commits)
arm64: dts: stratix10: fix SPI settings
ARM: dts: socfpga: add i2c reset signals
arm64: dts: stratix10: add USB ECC reset bit
arm64: dts: stratix10: enable USB on the devkit
ARM: dts: socfpga: disable over-current for Arria10 USB devkit
ARM: dts: Nokia N9: add support for up/down keys in the dts
ARM: dts: nomadik: add interrupt-parent for clcd
ARM: dts: Add ethernet to a bunch of platforms
ARM: dts: Add ethernet to the Gemini SoC
ARM: dts: rename oxnas dts files
ARM: dts: s5pv210: add interrupt-parent for ohci
ARM: lpc3250: fix uda1380 gpio numbers
ARM: dts: STi: Add gpio polarity for "hdmi,hpd-gpio" property
ARM: dts: dra7: Reduce shut down temperature of non-cpu thermal zones
ARM: dts: n900: Add aliases for lcd and tvout displays
ARM: dts: Update ti-sysc data for existing users
ARM: dts: Fix smartreflex compatible for omap3 shared mpu-iva instance
arm64: dts: marvell: armada-80x0: Fix pinctrl compatible string
arm: spear13xx: Fix spics gpio controller's warning
arm: spear13xx: Fix dmas cells
...
Linus Torvalds [Thu, 1 Feb 2018 21:36:15 +0000 (13:36 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Add a console_msg_format command line option:
The value "default" keeps the old "[time stamp] text\n" format. The
value "syslog" allows to see the syslog-like "<log
level>[timestamp] text" format.
This feature was requested by people doing regression tests, for
example, 0day robot. They want to have both filtered and full logs
at hands.
- Reduce the risk of softlockup:
Pass the console owner in a busy loop.
This is a new approach to the old problem. It was first proposed by
Steven Rostedt on Kernel Summit 2017. It marks a context in which
the console_lock owner calls console drivers and could not sleep.
On the other side, printk() callers could detect this state and use
a busy wait instead of a simple console_trylock(). Finally, the
console_lock owner checks if there is a busy waiter at the end of
the special context and eventually passes the console_lock to the
waiter.
The hand-off works surprisingly well and helps in many situations.
Well, there is still a possibility of the softlockup, for example,
when the flood of messages stops and the last owner still has too
much to flush.
There is increasing number of people having problems with
printk-related softlockups. We might eventually need to get better
solution. Anyway, this looks like a good start and promising
direction.
- Do not allow to schedule in console_unlock() called from printk():
This reverts an older controversial commit. The reschedule helped
to avoid softlockups. But it also slowed down the console output.
This patch is obsoleted by the new console waiter logic described
above. In fact, the reschedule made the hand-off less effective.
- Deprecate "%pf" and "%pF" format specifier:
It was needed on ia64, ppc64 and parisc64 to dereference function
descriptors and show the real function address. It is done
transparently by "%ps" and "pS" format specifier now.
Sergey Senozhatsky found that all the function descriptors were in
a special elf section and could be easily detected.
- Remove printk_symbol() API:
It has been obsoleted by "%pS" format specifier, and this change
helped to remove few continuous lines and a less intuitive old API.
- Remove redundant memsets:
Sergey removed unnecessary memset when processing printk.devkmsg
command line option.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
printk: drop redundant devkmsg_log_str memsets
printk: Never set console_may_schedule in console_trylock()
printk: Hide console waiter logic into helpers
printk: Add console owner and waiter logic to load balance console writes
kallsyms: remove print_symbol() function
checkpatch: add pF/pf deprecation warning
symbol lookup: introduce dereference_symbol_descriptor()
parisc64: Add .opd based function descriptor dereference
powerpc64: Add .opd based function descriptor dereference
ia64: Add .opd based function descriptor dereference
sections: split dereference_function_descriptor()
openrisc: Fix conflicting types for _exext and _stext
lib: do not use print_symbol()
irq debug: do not use print_symbol()
sysfs: do not use print_symbol()
drivers: do not use print_symbol()
x86: do not use print_symbol()
unicore32: do not use print_symbol()
sh: do not use print_symbol()
mn10300: do not use print_symbol()
...
Linus Torvalds [Thu, 1 Feb 2018 21:18:25 +0000 (13:18 -0800)]
Merge tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Mask INTx from user if pdev->irq is zero (Alexey Kardashevskiy)
- Capability helper cleanup (Alex Williamson)
- Allow mmaps overlapping MSI-X vector table with region capability
exposing this feature (Alexey Kardashevskiy)
- mdev static cleanups (Xiongwei Song)
* tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfio:
vfio: mdev: make a couple of functions and structure vfio_mdev_driver static
vfio-pci: Allow mapping MSIX BAR
vfio: Simplify capability helper
vfio-pci: Mask INTx if a device is not capabable of enabling it
Linus Torvalds [Thu, 1 Feb 2018 21:15:23 +0000 (13:15 -0800)]
Merge tag 'trace-v4.16' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"There's not much changes for the tracing system this release. Mostly
small clean ups and fixes.
The biggest change is to how bprintf works. bprintf is used by
trace_printk() to just save the format and args of a printf call, and
the formatting is done when the trace buffer is read. This is done to
keep the formatting out of the fast path (this was recommended by
you). The issue is when arguments are de-referenced.
If a pointer is saved, and the format has something like "%*pbl", when
the buffer is read, it will de-reference the argument then. The
problem is if the data no longer exists. This can cause the kernel to
oops.
The fix for this was to make these de-reference pointes do the
formatting at the time it is called (the fast path), as this
guarantees that the data exists (and doesn't change later)"
* tag 'trace-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
vsprintf: Do not have bprintf dereference pointers
ftrace: Mark function tracer test functions noinline/noclone
trace_uprobe: Display correct offset in uprobe_events
tracing: Make sure the parsed string always terminates with '\0'
tracing: Clear parser->idx if only spaces are read
tracing: Detect the string nul character when parsing user input string
Linus Torvalds [Thu, 1 Feb 2018 20:20:53 +0000 (12:20 -0800)]
Merge branch 'KASAN-read_word_at_a_time'
Merge KASAN word-at-a-time fixups from Andrey Ryabinin.
The word-at-a-time optimizations have caused headaches for KASAN, since
the whole point is that we access byte streams in bigger chunks, and
KASAN can be unhappy about the potential extra access at the end of the
string.
We used to have a horrible hack in dcache, and then people got
complaints from the strscpy() case. This fixes it all up properly, by
adding an explicit helper for the "access byte stream one word at a
time" case.
* emailed patches from Andrey Ryabinin <aryabinin@virtuozzo.com>:
fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports"
fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()
lib/strscpy: Shut up KASAN false-positives in strscpy()
compiler.h: Add read_word_at_a_time() function.
compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
Andrey Ryabinin [Thu, 1 Feb 2018 18:00:52 +0000 (21:00 +0300)]
fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports"
This reverts commit
df4c0e36f1b1782b0611a77c52cc240e5c4752dd.
It's no longer needed since dentry_string_cmp() now uses
read_word_at_a_time() to avoid kasan's reports.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Thu, 1 Feb 2018 18:00:51 +0000 (21:00 +0300)]
fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()
dentry_string_cmp() performs the word-at-a-time reads from 'cs' and may
read slightly more than it was requested in kmallac(). Normally this
would make KASAN to report out-of-bounds access, but this was
workarounded by commit
df4c0e36f1b1 ("fs: dcache: manually unpoison
dname after allocation to shut up kasan's reports").
This workaround is not perfect, since it allows out-of-bounds access to
dentry's name for all the code, not just in dentry_string_cmp().
So it would be better to use read_word_at_a_time() instead and revert
commit
df4c0e36f1b1.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Thu, 1 Feb 2018 18:00:50 +0000 (21:00 +0300)]
lib/strscpy: Shut up KASAN false-positives in strscpy()
strscpy() performs the word-at-a-time optimistic reads. So it may may
access the memory past the end of the object, which is perfectly fine
since strscpy() doesn't use that (past-the-end) data and makes sure the
optimistic read won't cross a page boundary.
Use new read_word_at_a_time() to shut up the KASAN.
Note that this potentially could hide some bugs. In example bellow,
stscpy() will copy more than we should (1-3 extra uninitialized bytes):
char dst[8];
char *src;
src = kmalloc(5, GFP_KERNEL);
memset(src, 0xff, 5);
strscpy(dst, src, 8);
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Thu, 1 Feb 2018 18:00:49 +0000 (21:00 +0300)]
compiler.h: Add read_word_at_a_time() function.
Sometimes we know that it's safe to do potentially out-of-bounds access
because we know it won't cross a page boundary. Still, KASAN will
report this as a bug.
Add read_word_at_a_time() function which is supposed to be used in such
cases. In read_word_at_a_time() KASAN performs relaxed check - only the
first byte of access is validated.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Thu, 1 Feb 2018 18:00:48 +0000 (21:00 +0300)]
compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
Instead of having two identical __read_once_size_nocheck() functions
with different attributes, consolidate all the difference in new macro
__no_kasan_or_inline and use it. No functional changes.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Monakov [Thu, 1 Feb 2018 19:45:17 +0000 (22:45 +0300)]
net: pxa168_eth: add netconsole support
This implements ndo_poll_controller callback which is necessary to
enable netconsole.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 1 Feb 2018 18:26:57 +0000 (10:26 -0800)]
net: igmp: add a missing rcu locking section
Newly added igmpv3_get_srcaddr() needs to be called under rcu lock.
Timer callbacks do not ensure this locking.
=============================
WARNING: suspicious RCU usage
4.15.0+ #200 Not tainted
-----------------------------
./include/linux/inetdevice.h:216 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
3 locks held by syzkaller616973/4074:
#0: (&mm->mmap_sem){++++}, at: [<
00000000bfce669e>] __do_page_fault+0x32d/0xc90 arch/x86/mm/fault.c:1355
#1: ((&im->timer)){+.-.}, at: [<
00000000619d2f71>] lockdep_copy_map include/linux/lockdep.h:178 [inline]
#1: ((&im->timer)){+.-.}, at: [<
00000000619d2f71>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1316
#2: (&(&im->lock)->rlock){+.-.}, at: [<
000000005f833c5c>] spin_lock_bh include/linux/spinlock.h:315 [inline]
#2: (&(&im->lock)->rlock){+.-.}, at: [<
000000005f833c5c>] igmpv3_send_report+0x98/0x5b0 net/ipv4/igmp.c:600
stack backtrace:
CPU: 0 PID: 4074 Comm: syzkaller616973 Not tainted 4.15.0+ #200
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4592
__in_dev_get_rcu include/linux/inetdevice.h:216 [inline]
igmpv3_get_srcaddr net/ipv4/igmp.c:329 [inline]
igmpv3_newpack+0xeef/0x12e0 net/ipv4/igmp.c:389
add_grhead.isra.27+0x235/0x300 net/ipv4/igmp.c:432
add_grec+0xbd3/0x1170 net/ipv4/igmp.c:565
igmpv3_send_report+0xd5/0x5b0 net/ipv4/igmp.c:605
igmp_send_report+0xc43/0x1050 net/ipv4/igmp.c:722
igmp_timer_expire+0x322/0x5c0 net/ipv4/igmp.c:831
call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2d7/0xb85 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:541 [inline]
smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:938
Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Desnes Augusto Nunes do Rosario [Thu, 1 Feb 2018 18:04:30 +0000 (16:04 -0200)]
ibmvnic: fix firmware version when no firmware level has been provided by the VIOS server
Older versions of VIOS servers do not send the firmware level in the VPD
buffer for the ibmvnic driver. Thus, not only the current message is mis-
leading but the firmware version in the ethtool will be NULL. Therefore,
this patch fixes the firmware string and its warning.
Fixes: 4e6759be28e4 ("ibmvnic: Feature implementation of VPD for the ibmvnic driver")
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 1 Feb 2018 17:29:21 +0000 (17:29 +0000)]
vmxnet3: remove redundant initialization of pointer 'rq'
Pointer rq is being initialized but this value is never read, it
is being updated inside a for-loop. Remove the initialization and
move it into the scope of the for-loop.
Cleans up clang warning:
drivers/net/vmxnet3/vmxnet3_drv.c:2763:27: warning: Value stored
to 'rq' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 1 Feb 2018 17:10:18 +0000 (17:10 +0000)]
lan78xx: remove redundant initialization of pointer 'phydev'
Pointer phydev is initialized and this value is never read, phydev
is immediately updated to a new value, hence this initialization
is redundant and can be removed
Cleans up clang warning:
drivers/net/usb/lan78xx.c:2009:21: warning: Value stored to 'phydev'
during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 1 Feb 2018 16:58:42 +0000 (16:58 +0000)]
net: jme: remove unused initialization of 'rxdesc'
Pointer rxdesc is assigned a value that is never read, it is overwritten
by a new assignment inside a while loop hence the initial assignment
is redundant and can be removed.
Cleans up clang warning:
drivers/net/ethernet/jme.c:1074:17: warning: Value stored to 'rxdesc'
during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 1 Feb 2018 19:45:49 +0000 (11:45 -0800)]
Merge tag 'kconfig-v4.16' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kconfig updates from Masahiro Yamada:
"A pretty big batch of Kconfig updates.
I have to mention the lexer and parser of Kconfig are now built from
real .l and .y sources. So, flex and bison are the requirement for
building the kernel. Both of them (unlike gperf) have been stable for
a long time. This change has been tested several weeks in linux-next,
and I did not receive any problem report about this.
Summary:
- add checks for mistakes, like the choice default is not in choice,
help is doubled
- document data structure and complex code
- fix various memory leaks
- change Makefile to build lexer and parser instead of using
pre-generated C files
- drop 'boolean' keyword, which is equivalent to 'bool'
- use default 'yy' prefix and remove unneeded Make variables
- fix gettext() check for xconfig
- announce that oldnoconfig will be finally removed
- make 'Selected by:' and 'Implied by' readable in help and search
result
- hide silentoldconfig from 'make help' to stop confusing people
- fix misc things and cleanups"
* tag 'kconfig-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (37 commits)
kconfig: Remove silentoldconfig from help and docs; fix kconfig/conf's help
kconfig: make "Selected by:" and "Implied by:" readable
kconfig: announce removal of oldnoconfig if used
kconfig: fix make xconfig when gettext is missing
kconfig: Clarify menu and 'if' dependency propagation
kconfig: Document 'if' flattening logic
kconfig: Clarify choice dependency propagation
kconfig: Document SYMBOL_OPTIONAL logic
kbuild: remove unnecessary LEX_PREFIX and YACC_PREFIX
kconfig: use default 'yy' prefix for lexer and parser
kconfig: make conf_unsaved a local variable of conf_read()
kconfig: make xfgets() really static
kconfig: make input_mode static
kconfig: Warn if there is more than one help text
kconfig: drop 'boolean' keyword
kconfig: use bool instead of boolean for type definition attributes, again
kconfig: Remove menu_end_entry()
kconfig: Document important expression functions
kconfig: Document automatic submenu creation code
kconfig: Fix choice symbol expression leak
...
Linus Torvalds [Thu, 1 Feb 2018 19:43:45 +0000 (11:43 -0800)]
Merge tag 'kbuild-misc-v4.16' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild misc updates from Masahiro Yamada:
- add snap-pkg target to create Linux kernel snap package
- make out-of-tree creation of source packages fail correctly
- improve and fix several semantic patches
* tag 'kbuild-misc-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Coccinelle: coccicheck: fix typo
Coccinelle: memdup: drop spurious line
Coccinelle: kzalloc-simple: Rename kzalloc-simple to zalloc-simple
Coccinelle: ifnullfree: Trim the warning reported in report mode
Coccinelle: alloc_cast: Add more memory allocating functions to the list
Coccinelle: array_size: report even if include is missing
Coccinelle: kzalloc-simple: Add all zero allocating functions
kbuild: pkg: make out-of-tree rpm/deb-pkg build immediately fail
scripts/package: snap-pkg target
David S. Miller [Thu, 1 Feb 2018 19:41:46 +0000 (14:41 -0500)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix OOM that syskaller triggers with ipt_replace.size = -1 and
IPT_SO_SET_REPLACE socket option, from Dmitry Vyukov.
2) Check for too long extension name in xt_request_find_{match|target}
that result in out-of-bound reads, from Eric Dumazet.
3) Fix memory exhaustion bug in ipset hash:*net* types when adding ranges
that look like x.x.x.x-255.255.255.255, from Jozsef Kadlecsik.
4) Fix pointer leaks to userspace in x_tables, from Dmitry Vyukov.
5) Insufficient sanity checks in clusterip_tg_check(), also from Dmitry.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 1 Feb 2018 19:41:09 +0000 (11:41 -0800)]
Merge tag 'kbuild-v4.16' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- terminate the build correctly in case of fixdep errors
- clean up fixdep
- suppress packed-not-aligned warnings from GCC-8
- fix W= handling for extra DTC warnings
* tag 'kbuild-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: fix W= option checks for extra DTC warnings
Kbuild: suppress packed-not-aligned warning for default setting only
fixdep: use existing helper to check modular CONFIG options
fixdep: refactor parse_dep_file()
fixdep: move global variables to local variables of main()
fixdep: remove unneeded memcpy() in parse_dep_file()
fixdep: factor out common code for reading files
fixdep: use malloc() and read() to load dep_file to buffer
fixdep: remove unnecessary <arpa/inet.h> inclusion
fixdep: exit with error code in error branches of do_config_file()
Linus Torvalds [Thu, 1 Feb 2018 18:57:45 +0000 (10:57 -0800)]
Merge tag 'devicetree-for-4.16' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
- Convert to use memblock_virt_alloc in DT code which supports
bootmem arches. With this we can remove the arch specific
early_init_dt_alloc_memory_arch() functions.
- Enable running the DT unittests on UML
- Use SPDX license tags on DT files
- Fix early FDT kconfig ifdef logic
- Clean-up unittest Makefile
- Fix function comment for of_irq_parse_raw
- Add missing documentation for linux,initrd-{start,end} properties
- Clean-up of binding examples using uppercase hex
- Add trivial devices W83773G and Infineon TLV493D-A1B6
- Add missing STM32 SoC bindings
- Various small binding doc fixes
* tag 'devicetree-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (23 commits)
xtensa: remove arch specific early DT functions
x86: remove arch specific early_init_dt_alloc_memory_arch
nios2: remove arch specific early_init_dt_alloc_memory_arch
mips: remove arch specific early_init_dt_alloc_memory_arch
metag: remove arch specific early DT functions
cris: remove arch specific early DT functions
libfdt: remove unnecessary include directive from <linux/libfdt.h>
of: unittest: refactor Makefile
of/fdt: use memblock_virt_alloc for early alloc
of: Use SPDX license tag for DT files
of/fdt: Fix #ifdef dependency of early flattree declarations
dt-bindings: h8300 clocksource: correct spelling of pulse
dt-bindings: imx6q-pcie: Add required property for i.MX6SX
mmc: Don't reference Linux-specific OF_GPIO_ACTIVE_LOW flag in DT binding
dt-bindings: Use lower case hex in unit-addresses
dt-bindings: display: panel: Fix compatible string for Toshiba LT089AC29000
dt-bindings: Add Infineon TLV493D-A1B6
dt-bindings: mailbox: ti,message-manager: Fix interrupt name error
dt-bindings: chosen: Document linux,initrd-{start,end}
dt-bindings: arm: document supported STM32 SoC family
...
Linus Torvalds [Thu, 1 Feb 2018 18:49:58 +0000 (10:49 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input layer updates from Dmitry Torokhov:
- evdev interface has been adjusted to extend the life of timestamps on
32 bit systems to the year of 2108
- Synaptics RMI4 driver's PS/2 guest handling ha beed updated to
improve chances of detecting trackpoints on the pass-through port
- mms114 touchcsreen controller driver has been updated to support
generic device properties and work with mms152 cntrollers
- Goodix driver now supports generic touchscreen properties
- couple of drivers for AVR32 architecture are gone as the architecture
support has been removed from the kernel
- gpio-tilt driver has been removed as there are no mainline users and
the driver itself is using legacy APIs and relies on platform data
- MODULE_LINECSE/MODULE_VERSION cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits)
Input: goodix - use generic touchscreen_properties
Input: mms114 - fix typo in definition
Input: mms114 - use BIT() macro instead of explicit shifting
Input: mms114 - replace mdelay with msleep
Input: mms114 - add support for mms152
Input: mms114 - drop platform data and use generic APIs
Input: mms114 - mark as direct input device
Input: mms114 - do not clobber interrupt trigger
Input: edt-ft5x06 - fix error handling for factory mode on non-M06
Input: stmfts - set IRQ_NOAUTOEN to the irq flag
Input: auo-pixcir-ts - delete an unnecessary return statement
Input: auo-pixcir-ts - remove custom log for a failed memory allocation
Input: da9052_tsi - remove unused mutex
Input: docs - use PROPERTY_ENTRY_U32() directly
Input: synaptics-rmi4 - log when we create a guest serio port
Input: synaptics-rmi4 - unmask F03 interrupts when port is opened
Input: synaptics-rmi4 - do not delete interrupt memory too early
Input: ad7877 - use managed resource allocations
Input: stmfts,s6sy671 - add SPDX identifier
Input: remove atmel-wm97xx touchscreen driver
...
Linus Torvalds [Thu, 1 Feb 2018 18:31:17 +0000 (10:31 -0800)]
Merge tag 'char-misc-4.16-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big pull request for char/misc drivers for 4.16-rc1.
There's a lot of stuff in here. Three new driver subsystems were added
for various types of hardware busses:
- siox
- slimbus
- soundwire
as well as a new vboxguest subsystem for the VirtualBox hypervisor
drivers.
There's also big updates from the FPGA subsystem, lots of Android
binder fixes, the usual handful of hyper-v updates, and lots of other
smaller driver updates.
All of these have been in linux-next for a long time, with no reported
issues"
* tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (155 commits)
char: lp: use true or false for boolean values
android: binder: use VM_ALLOC to get vm area
android: binder: Use true and false for boolean values
lkdtm: fix handle_irq_event symbol for INT_HW_IRQ_EN
EISA: Delete error message for a failed memory allocation in eisa_probe()
EISA: Whitespace cleanup
misc: remove AVR32 dependencies
virt: vbox: Add error mapping for VERR_INVALID_NAME and VERR_NO_MORE_FILES
soundwire: Fix a signedness bug
uio_hv_generic: fix new type mismatch warnings
uio_hv_generic: fix type mismatch warnings
auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
uio_hv_generic: add rescind support
uio_hv_generic: check that host supports monitor page
uio_hv_generic: create send and receive buffers
uio: document uio_hv_generic regions
doc: fix documentation about uio_hv_generic
vmbus: add monitor_id and subchannel_id to sysfs per channel
vmbus: fix ABI documentation
uio_hv_generic: use ISR callback method
...
Andreas Gruenbacher [Mon, 8 Jan 2018 21:35:43 +0000 (22:35 +0100)]
gfs2: Glock dump performance regression fix
Restore an optimization removed in commit
7f19449553 "Fix debugfs glocks
dump": keep the glock hash table iterator active while the glock dump
file is held open. This avoids having to rescan the hash table from the
start for each read, with quadratically rising runtime.
In addition, use rhastable_walk_peek for resuming a glock dump at the
current position: when a glock doesn't fit in the provided buffer
anymore, the next read must revisit the same glock.
Finally, also restart the dump from the first entry when we notice that
the hash table has been resized in gfs2_glock_seq_start.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Andreas Gruenbacher [Thu, 1 Feb 2018 10:12:13 +0000 (11:12 +0100)]
gfs2: Fix the crc32c dependency
Depend on LIBCRC32C which uses the crypto API to select the appropriate
crc32c implementation. With the CRYPTO and CRYPTO_CRC32C dependencies,
gfs2 would still need to use the crypto API directly like ext4 and btrfs
do, which isn't necessary.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Linus Torvalds [Thu, 1 Feb 2018 18:00:28 +0000 (10:00 -0800)]
Merge tag 'driver-core-4.16-rc1' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with
reworks to try to attempt to make the code easier to handle in the
long run, but no functional change. There's also some tree-wide sysfs
attribute fixups with lots of acks from the various subsystem
maintainers, as well as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
device property: Define type of PROPERTY_ENRTY_*() macros
device property: Reuse property_entry_free_data()
device property: Move property_entry_free_data() upper
firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
USB: serial: keyspan: Drop firmware Kconfig options
sysfs: remove DEBUG defines
sysfs: use SPDX identifiers
drivers: base: add coredump driver ops
sysfs: add attribute specification for /sysfs/devices/.../coredump
test_firmware: fix missing unlock on error in config_num_requests_store()
test_firmware: make local symbol test_fw_config static
sysfs: turn WARN() into pr_warn()
firmware: Fix a typo in fallback-mechanisms.rst
treewide: Use DEVICE_ATTR_WO
treewide: Use DEVICE_ATTR_RO
treewide: Use DEVICE_ATTR_RW
sysfs.h: Use octal permissions
component: add debugfs support
bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
...
Linus Torvalds [Thu, 1 Feb 2018 17:51:57 +0000 (09:51 -0800)]
Merge tag 'staging-4.16-rc1' of git://git./linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big Staging and IIO driver patches for 4.16-rc1.
There is the normal amount of new IIO drivers added, like all
releases.
The networking IPX and the ncpfs filesystem are moved into the staging
tree, as they are on their way out of the kernel due to lack of use
anymore.
The visorbus subsystem finall has started moving out of the staging
tree to the "real" part of the kernel, and the most and fsl-mc
codebases are almost ready to move out, that will probably happen for
4.17-rc1 if all goes well.
Other than that, there is a bunch of license header cleanups in the
tree, along with the normal amount of coding style churn that we all
know and love for this codebase. I also got frustrated at the
Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
huge chunks of it that were never even being used.
Full details of everything is in the shortlog.
All of these patches have been in linux-next for a while with no
reported issues"
* tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits)
staging: rtlwifi: remove redundant initialization of 'cfg_cmd'
staging: rtl8723bs: remove a couple of redundant initializations
staging: comedi: reformat lines to 80 chars or less
staging: lustre: separate a connection destroy from free struct kib_conn
Staging: rtl8723bs: Use !x instead of NULL comparison
Staging: rtl8723bs: Remove dead code
Staging: rtl8723bs: Change names to conform to the kernel code
staging: ccree: Fix missing blank line after declaration
staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd'
staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST
staging: fbtft: remove unused FB_TFT_SSD1325 kconfig
staging: comedi: dt2811: remove redundant initialization of 'ns'
staging: wilc1000: fix alignments to match open parenthesis
staging: wilc1000: removed unnecessary defined enums typedef
staging: wilc1000: remove unnecessary use of parentheses
staging: rtl8192u: remove redundant initialization of 'timeout'
staging: sm750fb: fix CamelCase for dispSet var
staging: lustre: lnet/selftest: fix compile error on UP build
staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons
staging: rts5208: Fix "seg_no" calculation in reset_ms_card()
...
Linus Torvalds [Thu, 1 Feb 2018 17:46:00 +0000 (09:46 -0800)]
Merge tag 'tty-4.16-rc1' of git://git./linux/kernel/git/gregkh/tty
Pull tty/staging driver updates from Greg KH:
"Here is the big tty/serial driver update for 4.16-rc1.
The usual number of various serial driver fixes and updates to try to
get them to work with crazy hardware configurations (seriously, how
many different ways are hardware engineers going to come up with to
hook up a simple UART?)
There is also some serdev bugfixes and updates, as well as a
smattering of other small fixes in here.
All have been in the linux-next tree for a while, with no reported
issues"
* tag 'tty-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits)
tty: serial: exar: Relocate sleep wake-up handling
tty: fix data race between tty_init_dev and flush of buf
serial: imx: fix endless loop during suspend
serial: core: mark port as initialized after successful IRQ change
serdev: only match serdev devices
serdev: do not generate modaliases for controllers
serial: mxs-auart: don't use GPIOF_* with gpiod_get_direction
serial: 8250_dw: Revert "Improve clock rate setting"
MAINTAINERS: Add myself as designated reviewer for 8250_dw
gpio: serial: max310x: Support open-drain configuration for GPIOs
serdev: Fix serdev_uevent failure on ACPI enumerated serdev-controllers
serial: 8250_ingenic: Parse earlycon options
serial: 8250_ingenic: Add support for the JZ4770 SoC
serial: core: Make uart_parse_options take const char* argument
serial: 8250_of: fix return code when probe function fails to get reset
serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
serial: 8250_uniphier: fix error return code in uniphier_uart_probe()
tty: n_gsm: Allow ADM response in addition to UA for control dlci
tty: omap-serial: Fix initial on-boot RTS GPIO level
tty: serial: jsm: Add one check against NULL pointer dereference
...
Linus Torvalds [Thu, 1 Feb 2018 17:40:49 +0000 (09:40 -0800)]
Merge tag 'usb-4.16-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB/PHY updates from Greg KH:
"Here is the big USB and PHY driver update for 4.16-rc1.
Along with the normally expected XHCI, MUSB, and Gadget driver
patches, there are some PHY driver fixes, license cleanups, sysfs
attribute cleanups, usbip changes, and a raft of other smaller fixes
and additions.
Full details are in the shortlog.
All of these have been in the linux-next tree for a long time with no
reported issues"
* tag 'usb-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (137 commits)
USB: serial: pl2303: new device id for Chilitag
USB: misc: fix up some remaining DEVICE_ATTR() usages
USB: musb: fix up one odd DEVICE_ATTR() usage
USB: atm: fix up some remaining DEVICE_ATTR() usage
USB: move many drivers to use DEVICE_ATTR_WO
USB: move many drivers to use DEVICE_ATTR_RO
USB: move many drivers to use DEVICE_ATTR_RW
USB: misc: chaoskey: Use true and false for boolean values
USB: storage: remove old wording about how to submit a change
USB: storage: remove invalid URL from drivers
usb: ehci-omap: don't complain on -EPROBE_DEFER when no PHY found
usbip: list: don't list devices attached to vhci_hcd
usbip: prevent bind loops on devices attached to vhci_hcd
USB: serial: remove redundant initializations of 'mos_parport'
usb/gadget: Fix "high bandwidth" check in usb_gadget_ep_match_desc()
usb: gadget: compress return logic into one line
usbip: vhci_hcd: update 'status' file header and format
USB: serial: simple: add Motorola Tetra driver
CDC-ACM: apply quirk for card reader
usb: option: Add support for FS040U modem
...
Linus Torvalds [Thu, 1 Feb 2018 17:37:30 +0000 (09:37 -0800)]
Merge git://git./linux/kernel/git/davem/ide
Pull small IDE cleanup from David Miller.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
ide: remove duplicated assignment to 'cursg'
Linus Torvalds [Thu, 1 Feb 2018 17:34:52 +0000 (09:34 -0800)]
Merge git://git./linux/kernel/git/davem/sparc-next
Pull sparc updates from David Miller:
"Of note is the addition of a driver for the Data Analytics
Accelerator, and some small cleanups"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
oradax: Fix return value check in dax_attach()
sparc: vDSO: remove an extra tab
sparc64: drop unneeded compat include
sparc64: Oracle DAX driver
sparc64: Oracle DAX infrastructure
Linus Torvalds [Thu, 1 Feb 2018 17:31:04 +0000 (09:31 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
"Bug fixes, small improvements and one notable change: the system call
table and the unistd.h header are now generated automatically with a
shell script from a text file"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/decompressor: discard __ksymtab and .eh_frame sections
s390: fix handling of -1 in set{,fs}[gu]id16 syscalls
s390/tools: generate header files in arch/s390/include/generated/
s390/syscalls: use generated syscall_table.h and unistd.h header files
s390/syscalls: add Makefile to generate system call header files
s390/syscalls: add syscalltbl script
s390/syscalls: add system call table
s390/decompressor: swap .text and .rodata.compressed sections
s390/sclp: fix .data section specification
s390/ipl: avoid usage of __section(.data)
s390/head: replace hard coded values with constants
s390/disassembler: add generated gen_opcode_table tool to .gitignore
s390: remove bogus system call table entries
s390/kprobes: remove duplicate includes
s390/dasd: Remove dead return code checks
s390/dasd: Simplify code
s390/vdso: revise CFI annotations of vDSO functions
s390/kernel: emit CFI data in .debug_frame and discard .eh_frame sections
Julia Lawall [Thu, 1 Feb 2018 09:20:55 +0000 (10:20 +0100)]
Coccinelle: coccicheck: fix typo
Correct spelling of "coccinelle".
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Christian Brauner [Thu, 1 Feb 2018 11:56:00 +0000 (12:56 +0100)]
rtnetlink: remove check for IFLA_IF_NETNSID
RTM_NEWLINK supports the IFLA_IF_NETNSID property since
5bb8ed075428b71492734af66230aa0c07fcc515 so we should not error out
when it is passed.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 1 Feb 2018 11:21:15 +0000 (12:21 +0100)]
rocker: fix possible null pointer dereference in rocker_router_fib_event_work
Currently, rocker user may experience following null pointer
derefence bug:
[ 3.062141] BUG: unable to handle kernel NULL pointer dereference at
00000000000000d0
[ 3.065163] IP: rocker_router_fib_event_work+0x36/0x110 [rocker]
The problem is uninitialized rocker->wops pointer that is initialized
only with the first initialized port. So move the port initialization
before registering the fib events.
Fixes: 936bd486564a ("rocker: use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Thu, 1 Feb 2018 10:26:23 +0000 (11:26 +0100)]
inet: Avoid unitialized variable warning in inet_unhash()
With gcc-4.1.2:
net/ipv4/inet_hashtables.c: In function ‘inet_unhash’:
net/ipv4/inet_hashtables.c:628: warning: ‘ilb’ may be used uninitialized in this function
While this is a false positive, it can easily be avoided by using the
pointer itself as the canary variable.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Thu, 1 Feb 2018 10:25:27 +0000 (11:25 +0100)]
net: bridge: Fix uninitialized error in br_fdb_sync_static()
With gcc-4.1.2.:
net/bridge/br_fdb.c: In function ‘br_fdb_sync_static’:
net/bridge/br_fdb.c:996: warning: ‘err’ may be used uninitialized in this function
Indeed, if the list is empty, err will be uninitialized, and will be
propagated up as the function return value.
Fix this by preinitializing err to zero.
Fixes: eb7935830d00b9e0 ("net: bridge: use rhashtable for fdbs")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ed Swierk [Thu, 1 Feb 2018 02:48:02 +0000 (18:48 -0800)]
openvswitch: Remove padding from packet before L3+ conntrack processing
IPv4 and IPv6 packets may arrive with lower-layer padding that is not
included in the L3 length. For example, a short IPv4 packet may have
up to 6 bytes of padding following the IP payload when received on an
Ethernet device with a minimum packet length of 64 bytes.
Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(),
and help() in nf_conntrack_ftp) assume skb->len reflects the length of
the L3 header and payload, rather than referring back to
ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by
lower-layer padding.
In the normal IPv4 receive path, ip_rcv() trims the packet to
ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive
path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly
in the br_netfilter receive path, br_validate_ipv4() and
br_validate_ipv6() trim the packet to the L3 length before invoking
netfilter hooks.
Currently in the OVS conntrack receive path, ovs_ct_execute() pulls
the skb to the L3 header but does not trim it to the L3 length before
calling nf_conntrack_in(NF_INET_PRE_ROUTING). When
nf_conntrack_proto_tcp encounters a packet with lower-layer padding,
nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log
message. While extra zero bytes don't affect the checksum, the length
in the IP pseudoheader does. That length is based on skb->len, and
without trimming, it doesn't match the length the sender used when
computing the checksum.
In ovs_ct_execute(), trim the skb to the L3 length before higher-layer
processing.
Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Wed, 31 Jan 2018 20:43:05 +0000 (15:43 -0500)]
tcp_bbr: fix pacing_gain to always be unity when using lt_bw
This commit fixes the pacing_gain to remain at BBR_UNIT (1.0) when
using lt_bw and returning from the PROBE_RTT state to PROBE_BW.
Previously, when using lt_bw, upon exiting PROBE_RTT and entering
PROBE_BW the bbr_reset_probe_bw_mode() code could sometimes randomly
end up with a cycle_idx of 0 and hence have bbr_advance_cycle_phase()
set a pacing gain above 1.0. In such cases this would result in a
pacing rate that is 1.25x higher than intended, potentially resulting
in a high loss rate for a little while until we stop using the lt_bw a
bit later.
This commit is a stable candidate for kernels back as far as 4.9.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reported-by: Beyers Cronje <bcronje@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 31 Jan 2018 16:14:25 +0000 (16:14 +0000)]
be2net: remove redundant initialization of 'head' and pointer txq
Variable head is initialized to a value that is never read and is
being updated to a new value a few lines later, hence this
initialization is redundant and can be safely removed as well
as the now unused pointer txq.
Cleans up clang warning:
drivers/net/ethernet/emulex/benet/be_main.c:996:6: warning: Value
stored to 'head' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Feb 2018 14:36:04 +0000 (09:36 -0500)]
Merge branch 'bnx2x-disable-GSO-on-too-large-packets'
Daniel Axtens says:
====================
bnx2x: disable GSO on too-large packets
We observed a case where a packet received on an ibmveth device had a
GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
device, where it caused a firmware assert. This is described in detail
at [0].
Ultimately we want a fix in the core, but that is very tricky to
backport. So for now, just stop the bnx2x driver from crashing.
When net-next re-opens I will send the fix to the core and a revert
for this.
v4 changes:
- fix compilation error with EXPORTs (patch 1)
- only do slow test if gso_size is greater than 9000 bytes (patch 2)
Thanks,
Daniel
[0]: https://patchwork.ozlabs.org/patch/859410/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Axtens [Wed, 31 Jan 2018 03:15:34 +0000 (14:15 +1100)]
bnx2x: disable GSO where gso_size is too big for hardware
If a bnx2x card is passed a GSO packet with a gso_size larger than
~9700 bytes, it will cause a firmware error that will bring the card
down:
bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x25e43e47 0x00463e01 0x00010052
bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 7_13_1
... (dump of values continues) ...
Detect when the mac length of a GSO packet is greater than the maximum
packet size (9700 bytes) and disable GSO.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Axtens [Wed, 31 Jan 2018 03:15:33 +0000 (14:15 +1100)]
net: create skb_gso_validate_mac_len()
If you take a GSO skb, and split it into packets, will the MAC
length (L2 + L3 + L4 headers + payload) of those packets be small
enough to fit within a given length?
Move skb_gso_mac_seglen() to skbuff.h with other related functions
like skb_gso_network_seglen() so we can use it, and then create
skb_gso_validate_mac_len to do the full calculation.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Torokhov [Thu, 1 Feb 2018 08:37:30 +0000 (00:37 -0800)]
Merge branch 'next' into for-linus
Prepare input updates for 4.16 merge window.
Linus Torvalds [Thu, 1 Feb 2018 03:25:25 +0000 (19:25 -0800)]
Merge tag 'docs-4.16' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Documentation updates for 4.16.
New stuff includes refcount_t documentation, errseq documentation,
kernel-doc support for nested structure definitions, the removal of
lots of crufty kernel-doc support for unused formats, SPDX tag
documentation, the beginnings of a manual for subsystem maintainers,
and lots of fixes and updates.
As usual, some of the changesets reach outside of Documentation/ to
effect kerneldoc comment fixes. It also adds the new LICENSES
directory, of which Thomas promises I do not need to be the
maintainer"
* tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits)
linux-next: docs-rst: Fix typos in kfigure.py
linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt
Documentation: Fix misconversion of #if
docs: add index entry for networking/msg_zerocopy
Documentation: security/credentials.rst: explain need to sort group_list
LICENSES: Add MPL-1.1 license
LICENSES: Add the GPL 1.0 license
LICENSES: Add Linux syscall note exception
LICENSES: Add the MIT license
LICENSES: Add the BSD-3-clause "Clear" license
LICENSES: Add the BSD 3-clause "New" or "Revised" License
LICENSES: Add the BSD 2-clause "Simplified" license
LICENSES: Add the LGPL-2.1 license
LICENSES: Add the LGPL 2.0 license
LICENSES: Add the GPL 2.0 license
Documentation: Add license-rules.rst to describe how to properly identify file licenses
scripts: kernel_doc: better handle show warnings logic
fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at
doc: md: Fix a file name to md-fault.c in fault-injection.txt
errseq: Add to documentation tree
...
Linus Torvalds [Thu, 1 Feb 2018 03:21:14 +0000 (19:21 -0800)]
Merge branch 'work.vmci' of git://git./linux/kernel/git/viro/vfs
Pull vmci iov_iter updates from Al Viro:
"Get rid of "is it an iovec or an entire array?" flags in vmxi - just
use iov_iter. Simplifies the living hell out of that code..."
* 'work.vmci' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vmci: the same on the send side...
vmci: simplify qp_dequeue_locked()
vmci: get rid of qp_memcpy_from_queue()
vmci: fix buf_size in case of iovec-based accesses
Linus Torvalds [Thu, 1 Feb 2018 03:18:12 +0000 (19:18 -0800)]
Merge branch 'work.whack-a-mole' of git://git./linux/kernel/git/viro/vfs
Pull asm/uaccess.h whack-a-mole from Al Viro:
"It's linux/uaccess.h, damnit... Oh, well - eventually they'll stop
cropping up..."
* 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
asm-prototypes.h: use linux/uaccess.h, not asm/uaccess.h
riscv: use linux/uaccess.h, not asm/uaccess.h...
ppc: for put_user() pull linux/uaccess.h, not asm/uaccess.h
Linus Torvalds [Thu, 1 Feb 2018 03:15:23 +0000 (19:15 -0800)]
Merge branch 'work.dcache' of git://git./linux/kernel/git/viro/vfs
Pull dcache updates from Al Viro:
"Neil Brown's d_move()/d_path() race fix"
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: close race between getcwd() and d_move()
Linus Torvalds [Thu, 1 Feb 2018 02:46:22 +0000 (18:46 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
- misc fixes
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
mm: remove PG_highmem description
tools, vm: new option to specify kpageflags file
mm/swap.c: make functions and their kernel-doc agree
mm, memory_hotplug: fix memmap initialization
mm: correct comments regarding do_fault_around()
mm: numa: do not trap faults on shared data section pages.
hugetlb, mbind: fall back to default policy if vma is NULL
hugetlb, mempolicy: fix the mbind hugetlb migration
mm, hugetlb: further simplify hugetlb allocation API
mm, hugetlb: get rid of surplus page accounting tricks
mm, hugetlb: do not rely on overcommit limit during migration
mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
mm, hugetlb: unify core page allocation accounting and initialization
mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
mm/memcontrol.c: make local symbol static
mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
mm/compaction.c: fix comment for try_to_compact_pages()
mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
zsmalloc: use U suffix for negative literals being shifted
...
Daniel Vetter [Thu, 18 Jan 2018 15:40:16 +0000 (16:40 +0100)]
drm/ast: Load lut in crtc_commit
In the past the ast driver relied upon the fbdev emulation helpers to
call ->load_lut at boot-up. But since
commit
b8e2b0199cc377617dc238f5106352c06dcd3fa2
Author: Peter Rosin <peda@axentia.se>
Date: Tue Jul 4 12:36:57 2017 +0200
drm/fb-helper: factor out pseudo-palette
that's cleaned up and drivers are expected to boot into a consistent
lut state. This patch fixes that.
Fixes: b8e2b0199cc3 ("drm/fb-helper: factor out pseudo-palette")
Cc: Peter Rosin <peda@axenita.se>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # v4.14+
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=198123
Cc: Bill Fraser <bill.fraser@gmail.com>
Reported-and-Tested-by: Bill Fraser <bill.fraser@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 1 Feb 2018 01:34:47 +0000 (11:34 +1000)]
Merge tag 'drm-misc-next-fixes-2018-01-31' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
This contains a fix to restrict what lessee can do with masters and
another one when waiting for timeouts on reservation objects.
* tag 'drm-misc-next-fixes-2018-01-31' of git://anongit.freedesktop.org/drm/drm-misc:
drm: Check for lessee in DROP_MASTER ioctl
dma-buf: fix reservation_object_wait_timeout_rcu once more v2
Miles Chen [Thu, 1 Feb 2018 00:21:27 +0000 (16:21 -0800)]
mm: remove PG_highmem description
Commit
cbe37d093707 ("[PATCH] mm: remove PG_highmem") removed PG_highmem
to save a page flag. So the description of PG_highmem is no longer
needed.
Link: http://lkml.kernel.org/r/1517391212-2950-1-git-send-email-miles.chen@mediatek.com
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Thu, 1 Feb 2018 00:21:23 +0000 (16:21 -0800)]
tools, vm: new option to specify kpageflags file
page-types currently hardcodes /proc/kpageflags as the file to parse.
This works when using the tool to examine the state of pageflags on the
same system, but does not allow storing a snapshot of pageflags at a
given time to debug issues nor on a different system.
This allows the user to specify a saved version of kpageflags with a new
page-types -F option.
[akpm@linux-foundation.org: add "filename" to fix usage() string]
[rientjes@google.com: fix layout]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801301840050.140969@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801301458180.153857@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Thu, 1 Feb 2018 00:21:19 +0000 (16:21 -0800)]
mm/swap.c: make functions and their kernel-doc agree
Fix some basic kernel-doc notation in mm/swap.c:
- for function lru_cache_add_anon(), make its kernel-doc function name
match its function name and change colon to hyphen following the
function name
- for function pagevec_lookup_entries(), change the function parameter
name from nr_pages to nr_entries since that is more descriptive of
what the parameter actually is and then it matches the kernel-doc
comments also
Fix function kernel-doc to match the change in commit
67fd707f4681:
- drop the kernel-doc notation for @nr_pages from
pagevec_lookup_range() and correct the function description for that
change
Link: http://lkml.kernel.org/r/3b42ee3e-04a9-a6ca-6be4-f00752a114fe@infradead.org
Fixes: 67fd707f4681 ("mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Feb 2018 00:21:14 +0000 (16:21 -0800)]
mm, memory_hotplug: fix memmap initialization
Bharata has noticed that onlining a newly added memory doesn't increase
the total memory, pointing to commit
f7f99100d8d9 ("mm: stop zeroing
memory during allocation in vmemmap") as a culprit. This commit has
changed the way how the memory for memmaps is initialized and moves it
from the allocation time to the initialization time. This works
properly for the early memmap init path.
It doesn't work for the memory hotplug though because we need to mark
page as reserved when the sparsemem section is created and later
initialize it completely during onlining. memmap_init_zone is called in
the early stage of onlining. With the current code it calls
__init_single_page and as such it clears up the whole stage and
therefore online_pages_range skips those pages.
Fix this by skipping mm_zero_struct_page in __init_single_page for
memory hotplug path. This is quite uggly but unifying both early init
and memory hotplug init paths is a large project. Make sure we plug the
regression at least.
Link: http://lkml.kernel.org/r/20180130101141.GW21609@dhcp22.suse.cz
Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
William Kucharski [Thu, 1 Feb 2018 00:21:11 +0000 (16:21 -0800)]
mm: correct comments regarding do_fault_around()
There are multiple comments surrounding do_fault_around that memtion
fault_around_pages() and fault_around_mask(), two routines that do not
exist. These comments should be reworded to reference
fault_around_bytes, the value which is used to determine how much
do_fault_around() will attempt to read when processing a fault.
These comments should have been updated when fault_around_pages() and
fault_around_mask() were removed in commit
aecd6f44266c ("mm: close race
between do_fault_around() and fault_around_bytes_set()").
Fixes: aecd6f44266c1 ("mm: close race between do_fault_around() and fault_around_bytes_set()")
Link: http://lkml.kernel.org/r/302D0B14-C7E9-44C6-8BED-033F9ACBD030@oracle.com
Signed-off-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Henry Willard [Thu, 1 Feb 2018 00:21:07 +0000 (16:21 -0800)]
mm: numa: do not trap faults on shared data section pages.
Workloads consisting of a large number of processes running the same
program with a very large shared data segment may experience performance
problems when numa balancing attempts to migrate the shared cow pages.
This manifests itself with many processes or tasks in
TASK_UNINTERRUPTIBLE state waiting for the shared pages to be migrated.
The program listed below simulates the conditions with these results
when run with 288 processes on a 144 core/8 socket machine.
Average throughput Average throughput Average throughput
with numa_balancing=0 with numa_balancing=1 with numa_balancing=1
without the patch with the patch
--------------------- --------------------- ---------------------
2118782 2021534 2107979
Complex production environments show less variability and fewer poorly
performing outliers accompanied with a smaller number of processes
waiting on NUMA page migration with this patch applied. In some cases,
%iowait drops from 16%-26% to 0.
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved.
*/
#include <sys/time.h>
#include <stdio.h>
#include <wait.h>
#include <sys/mman.h>
int a[
1000000] = {13};
int main(int argc, const char **argv)
{
int n = 0;
int i;
pid_t pid;
int stat;
int *count_array;
int cpu_count = 288;
long total = 0;
struct timeval t1, t2 = {(argc > 1 ? atoi(argv[1]) : 10), 0};
if (argc > 2)
cpu_count = atoi(argv[2]);
count_array = mmap(NULL, cpu_count * sizeof(int),
(PROT_READ|PROT_WRITE),
(MAP_SHARED|MAP_ANONYMOUS), 0, 0);
if (count_array == MAP_FAILED) {
perror("mmap:");
return 0;
}
for (i = 0; i < cpu_count; ++i) {
pid = fork();
if (pid <= 0)
break;
if ((i & 0xf) == 0)
usleep(2);
}
if (pid != 0) {
if (i == 0) {
perror("fork:");
return 0;
}
for (;;) {
pid = wait(&stat);
if (pid < 0)
break;
}
for (i = 0; i < cpu_count; ++i)
total += count_array[i];
printf("Total %ld\n", total);
munmap(count_array, cpu_count * sizeof(int));
return 0;
}
gettimeofday(&t1, 0);
timeradd(&t1, &t2, &t1);
while (timercmp(&t2, &t1, <)) {
int b = 0;
int j;
for (j = 0; j <
1000000; j++)
b += a[j];
gettimeofday(&t2, 0);
n++;
}
count_array[i] = n;
return 0;
}
This patch changes change_pte_range() to skip shared copy-on-write pages
when called from change_prot_numa().
NOTE: change_prot_numa() is nominally called from task_numa_work() and
queue_pages_test_walk(). task_numa_work() is the auto NUMA balancing
path, and queue_pages_test_walk() is part of explicit NUMA policy
management. However, queue_pages_test_walk() only calls
change_prot_numa() when MPOL_MF_LAZY is specified and currently that is
not allowed, so change_prot_numa() is only called from auto NUMA
balancing.
In the case of explicit NUMA policy management, shared pages are not
migrated unless MPOL_MF_MOVE_ALL is specified, and MPOL_MF_MOVE_ALL
depends on CAP_SYS_NICE. Currently, there is no way to pass information
about MPOL_MF_MOVE_ALL to change_pte_range. This will have to be fixed
if MPOL_MF_LAZY is enabled and MPOL_MF_MOVE_ALL is to be honored in lazy
migration mode.
task_numa_work() skips the read-only VMAs of programs and shared
libraries.
Link: http://lkml.kernel.org/r/1516751617-7369-1-git-send-email-henry.willard@oracle.com
Signed-off-by: Henry Willard <henry.willard@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Feb 2018 00:21:03 +0000 (16:21 -0800)]
hugetlb, mbind: fall back to default policy if vma is NULL
Dan Carpenter has noticed that mbind migration callback (new_page) can
get a NULL vma pointer and choke on it inside alloc_huge_page_vma which
relies on the VMA to get the hstate. We used to BUG_ON this case but
the BUG_+ON has been removed recently by "hugetlb, mempolicy: fix the
mbind hugetlb migration".
The proper way to handle this is to get the hstate from the migrated
page and rely on huge_node (resp. get_vma_policy) do the right thing
with null VMA. We are currently falling back to the default mempolicy
in that case which is in line what THP path is doing here.
Link: http://lkml.kernel.org/r/20180110104712.GR1732@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Feb 2018 00:21:00 +0000 (16:21 -0800)]
hugetlb, mempolicy: fix the mbind hugetlb migration
do_mbind migration code relies on alloc_huge_page_noerr for hugetlb
pages. alloc_huge_page_noerr uses alloc_huge_page which is a highlevel
allocation function which has to take care of reserves, overcommit or
hugetlb cgroup accounting. None of that is really required for the page
migration because the new page is only temporal and either will replace
the original page or it will be dropped. This is essentially as for
other migration call paths and there shouldn't be any reason to handle
mbind in a special way.
The current implementation is even suboptimal because the migration
might fail just because the hugetlb cgroup limit is reached, or the
overcommit is saturated.
Fix this by making mbind like other hugetlb migration paths. Add a new
migration helper alloc_huge_page_vma as a wrapper around
alloc_huge_page_nodemask with additional mempolicy handling.
alloc_huge_page_noerr has no more users and it can go.
Link: http://lkml.kernel.org/r/20180103093213.26329-7-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Feb 2018 00:20:56 +0000 (16:20 -0800)]
mm, hugetlb: further simplify hugetlb allocation API
Hugetlb allocator has several layer of allocation functions depending
and the purpose of the allocation. There are two allocators depending
on whether the page can be allocated from the page allocator or we need
a contiguous allocator. This is currently opencoded in
alloc_fresh_huge_page which is the only path that might allocate giga
pages which require the later allocator. Create alloc_fresh_huge_page
which hides this implementation detail and use it in all callers which
hardcoded the buddy allocator path (__hugetlb_alloc_buddy_huge_page).
This shouldn't introduce any funtional change because both migration and
surplus allocators exlude giga pages explicitly.
While we are at it let's do some renaming. The current scheme is not
consistent and overly painfull to read and understand. Get rid of
prefix underscores from most functions. There is no real reason to make
names longer.
* alloc_fresh_huge_page is the new layer to abstract underlying
allocator
* __hugetlb_alloc_buddy_huge_page becomes shorter and neater
alloc_buddy_huge_page.
* Former alloc_fresh_huge_page becomes alloc_pool_huge_page because we put
the new page directly to the pool
* alloc_surplus_huge_page can drop the opencoded prep_new_huge_page code
as it uses alloc_fresh_huge_page now
* others lose their excessive prefix underscores to make names shorter
[dan.carpenter@oracle.com: fix double unlock bug in alloc_surplus_huge_page()]
Link: http://lkml.kernel.org/r/20180109200559.g3iz5kvbdrz7yydp@mwanda
Link: http://lkml.kernel.org/r/20180103093213.26329-6-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Feb 2018 00:20:52 +0000 (16:20 -0800)]
mm, hugetlb: get rid of surplus page accounting tricks
alloc_surplus_huge_page increases the pool size and the number of
surplus pages opportunistically to prevent from races with the pool size
change. See commit
d1c3fb1f8f29 ("hugetlb: introduce
nr_overcommit_hugepages sysctl") for more details.
The resulting code is unnecessarily hairy, cause code duplication and
doesn't allow to share the allocation paths. Moreover pool size changes
tend to be very seldom so optimizing for them is not really reasonable.
Simplify the code and allow to allocate a fresh surplus page as long as
we are under the overcommit limit and then recheck the condition after
the allocation and drop the new page if the situation has changed. This
should provide a reasonable guarantee that an abrupt allocation requests
will not go way off the limit.
If we consider races with the pool shrinking and enlarging then we
should be reasonably safe as well. In the first case we are off by one
in the worst case and the second case should work OK because the page is
not yet visible. We can waste CPU cycles for the allocation but that
should be acceptable for a relatively rare condition.
Link: http://lkml.kernel.org/r/20180103093213.26329-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Feb 2018 00:20:48 +0000 (16:20 -0800)]
mm, hugetlb: do not rely on overcommit limit during migration
hugepage migration relies on __alloc_buddy_huge_page to get a new page.
This has 2 main disadvantages.
1) it doesn't allow to migrate any huge page if the pool is used
completely which is not an exceptional case as the pool is static and
unused memory is just wasted.
2) it leads to a weird semantic when migration between two numa nodes
might increase the pool size of the destination NUMA node while the
page is in use. The issue is caused by per NUMA node surplus pages
tracking (see free_huge_page).
Address both issues by changing the way how we allocate and account
pages allocated for migration. Those should temporal by definition. So
we mark them that way (we will abuse page flags in the 3rd page) and
update free_huge_page to free such pages to the page allocator. Page
migration path then just transfers the temporal status from the new page
to the old one which will be freed on the last reference. The global
surplus count will never change during this path but we still have to be
careful when migrating a per-node suprlus page. This is now handled in
move_hugetlb_state which is called from the migration path and it copies
the hugetlb specific page state and fixes up the accounting when needed
Rename __alloc_buddy_huge_page to __alloc_surplus_huge_page to better
reflect its purpose. The new allocation routine for the migration path
is __alloc_migrate_huge_page.
The user visible effect of this patch is that migrated pages are really
temporal and they travel between NUMA nodes as per the migration
request:
Before migration
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
After
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:1
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
with the previous implementation, both nodes would have nr_hugepages:1
until the page is freed.
Link: http://lkml.kernel.org/r/20180103093213.26329-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Feb 2018 00:20:44 +0000 (16:20 -0800)]
mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
Gigantic hugetlb pages were ingrown to the hugetlb code as an alien
specie with a lot of special casing. The allocation path is not an
exception. Unnecessarily so to be honest. It is true that the
underlying allocator is different but that is an implementation detail.
This patch unifies the hugetlb allocation path that a prepares fresh
pool pages. alloc_fresh_gigantic_page basically copies
alloc_fresh_huge_page logic so we can move everything there. This will
simplify set_max_huge_pages which doesn't have to care about what kind
of huge page we allocate.
Link: http://lkml.kernel.org/r/20180103093213.26329-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>