Lipeng [Tue, 19 Sep 2017 16:17:10 +0000 (17:17 +0100)]
net: hns3: Fixes initialization of phy address from firmware
Default phy address of every port is 0. Therefore, phy address for
each port need to be fetched from firmware and device initialized
with fetched non-default phy address.
Fixes:
6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 19 Sep 2017 16:15:59 +0000 (09:15 -0700)]
bpf: do not disable/enable BH in bpf_map_free_id()
syzkaller reported following splat [1]
Since hard irq are disabled by the caller, bpf_map_free_id()
should not try to enable/disable BH.
Another solution would be to change htab_map_delete_elem() to
defer the free_htab_elem() call after
raw_spin_unlock_irqrestore(&b->lock, flags), but this might be not
enough to cover other code paths.
[1]
WARNING: CPU: 1 PID: 8052 at kernel/softirq.c:161 __local_bh_enable_ip
+0x1e/0x160 kernel/softirq.c:161
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 8052 Comm: syz-executor1 Not tainted 4.13.0-next-
20170915+
#23
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x417 kernel/panic.c:181
__warn+0x1c4/0x1d9 kernel/panic.c:542
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
RIP: 0010:__local_bh_enable_ip+0x1e/0x160 kernel/softirq.c:161
RSP: 0018:
ffff8801cdcd7748 EFLAGS:
00010046
RAX:
0000000000000082 RBX:
0000000000000201 RCX:
0000000000000000
RDX:
1ffffffff0b5933c RSI:
0000000000000201 RDI:
ffffffff85ac99e0
RBP:
ffff8801cdcd7758 R08:
ffffffff85b87158 R09:
1ffff10039b9aec6
R10:
ffff8801c99f24c0 R11:
0000000000000002 R12:
ffffffff817b0b47
R13:
dffffc0000000000 R14:
ffff8801cdcd77e8 R15:
0000000000000001
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:176 [inline]
_raw_spin_unlock_bh+0x30/0x40 kernel/locking/spinlock.c:207
spin_unlock_bh include/linux/spinlock.h:361 [inline]
bpf_map_free_id kernel/bpf/syscall.c:197 [inline]
__bpf_map_put+0x267/0x320 kernel/bpf/syscall.c:227
bpf_map_put+0x1a/0x20 kernel/bpf/syscall.c:235
bpf_map_fd_put_ptr+0x15/0x20 kernel/bpf/map_in_map.c:96
free_htab_elem+0xc3/0x1b0 kernel/bpf/hashtab.c:658
htab_map_delete_elem+0x74d/0x970 kernel/bpf/hashtab.c:1063
map_delete_elem kernel/bpf/syscall.c:633 [inline]
SYSC_bpf kernel/bpf/syscall.c:1479 [inline]
SyS_bpf+0x2188/0x46a0 kernel/bpf/syscall.c:1451
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes:
f3f1c054c288 ("bpf: Introduce bpf_map ID")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Gruenbacher [Tue, 19 Sep 2017 10:41:37 +0000 (12:41 +0200)]
rhashtable: Documentation tweak
Clarify that rhashtable_walk_{stop,start} will not reset the iterator to
the beginning of the hash table. Confusion between rhashtable_walk_enter
and rhashtable_walk_start has already lead to a bug.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jerome Brunet [Mon, 18 Sep 2017 12:59:20 +0000 (14:59 +0200)]
net: phy: Kconfig: Fix PHY infrastructure menu in menuconfig
Since the integration of PHYLINK, the configuration option which
used to be under the PHY infrastructure menu in menuconfig ended
up one level up (the network device driver section)
By placing PHYLINK option right after PHYLIB entry, it broke the
way Kconfig used to build the menu. See kconfig-language.txt, section
"Menu structure", 2nd method.
This is fixed by placing the PHYLINK option just before PHYLIB.
Fixes:
9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Sep 2017 17:51:08 +0000 (10:51 -0700)]
Merge tag 'mac80211-for-davem-2017-11-19' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just two netlink fixes, both allowing privileged users
to crash the kernel with malformed netlink messages.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Tue, 19 Sep 2017 09:54:34 +0000 (12:54 +0300)]
MAINTAINERS: Remove Yuval Mintz from maintainers list
Remove Yuval from maintaining the bnx2x & qed* modules as he is no longer
working for the company. Thanks Yuval for your huge contributions and
tireless efforts over the many years and various companies.
Ariel
Signed-off-by: Ariel Elior <aelior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 18 Sep 2017 23:31:30 +0000 (16:31 -0700)]
net: systemport: Fix 64-bit statistics dependency
There are several problems with commit
10377ba7673d ("net: systemport:
Support 64bit statistics", first one got fixed in
7095c973453e ("net:
systemport: Fix 64-bit stats deadlock").
The second problem is that this specific code updates the
stats64.tx_{packets,bytes} from ndo_get_stats64() and that is what we
are returning to ethtool -S. If we are not running a tool that involves
calling ndo_get_stats64(), then we won't get updated ethtool stats.
The solution to this is to update the stats from both call sites,
factoring that into a specific function, While at it, don't just check
the sizeof() but also the type of the statistics in order to use the
64-bit stats seqlock.
Fixes:
10377ba7673d ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 18 Sep 2017 20:03:43 +0000 (13:03 -0700)]
8139too: revisit napi_complete_done() usage
It seems we have to be more careful in napi_complete_done()
use. This patch is not a revert, as it seems we can
avoid bug that Ville reported by moving the napi_complete_done()
test in the spinlock section.
Many thanks to Ville for detective work and all tests.
Fixes:
617f01211baf ("8139too: use napi_complete_done()")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Mon, 18 Sep 2017 18:05:16 +0000 (11:05 -0700)]
tcp: remove two unused functions
remove tcp_may_send_now and tcp_snd_test that are no longer used
Fixes:
840a3cbe8969 ("tcp: remove forward retransmit feature")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Mon, 18 Sep 2017 13:03:46 +0000 (15:03 +0200)]
bpf: devmap: pass on return value of bpf_map_precharge_memlock
If bpf_map_precharge_memlock in dev_map_alloc, -ENOMEM is returned
regardless of the actual error produced by bpf_map_precharge_memlock.
Fix it by passing on the error returned by bpf_map_precharge_memlock.
Also return -EINVAL instead of -ENOMEM if the page count overflow check
fails.
This makes dev_map_alloc match the behavior of other bpf maps' alloc
functions wrt. return values.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 18 Sep 2017 11:35:37 +0000 (17:05 +0530)]
bnxt_en: check for ingress qdisc in flower offload
Check for ingress-only qdisc for flower offload, as other qdiscs
are not supported for flower offload.
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 16 Sep 2017 20:10:06 +0000 (13:10 -0700)]
Documentation: networking: fix ASCII art in switchdev.txt
Fix ASCII art in Documentation/networking/switchdev.txt:
Change non-ASCII "spaces" to ASCII spaces.
Change 2 erroneous '+' characters in ASCII art to '-' (at the '*'
characters below):
line 32:
+--+----+----+----+-*--+----+---+ +-----+-----+
line 41:
+--------------+---*------------+
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Sat, 16 Sep 2017 12:02:21 +0000 (14:02 +0200)]
net/sched: cls_matchall: fix crash when used with classful qdisc
this script, edited from Linux Advanced Routing and Traffic Control guide
tc q a dev en0 root handle 1: htb default a
tc c a dev en0 parent 1: classid 1:1 htb rate 6mbit burst 15k
tc c a dev en0 parent 1:1 classid 1:a htb rate 5mbit ceil 6mbit burst 15k
tc c a dev en0 parent 1:1 classid 1:b htb rate 1mbit ceil 6mbit burst 15k
tc f a dev en0 parent 1:0 prio 1 $clsname $clsargs classid 1:b
ping $address -c1
tc -s c s dev en0
classifies traffic to 1:b or 1:a, depending on whether the packet matches
or not the pattern $clsargs of filter $clsname. However, when $clsname is
'matchall', a systematic crash can be observed in htb_classify(). HTB and
classful qdiscs don't assign initial value to struct tcf_result, but then
they expect it to contain valid values after filters have been run. Thus,
current 'matchall' ignores the TCA_MATCHALL_CLASSID attribute, configured
by user, and makes HTB (and classful qdiscs) dereference random pointers.
By assigning head->res to *res in mall_classify(), before the actions are
invoked, we fix this crash and enable TCA_MATCHALL_CLASSID functionality,
that had no effect on 'matchall' classifier since its first introduction.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1460213
Reported-by: Jiri Benc <jbenc@redhat.com>
Fixes:
b87f7936a932 ("net/sched: introduce Match-all classifier")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 15 Sep 2017 07:58:33 +0000 (15:58 +0800)]
ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline
If ipv6 has been disabled from cmdline since kernel started, it makes
no sense to allow users to create any ip6 tunnel. Otherwise, it could
some potential problem.
Jianlin found a kernel crash caused by this in ip6_gre when he set
ipv6.disable=1 in grub:
[ 209.588865] Unable to handle kernel paging request for data at address 0x00000080
[ 209.588872] Faulting instruction address: 0xc000000000a3aa6c
[ 209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
[ 209.589062] NIP [
c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
[ 209.589071] LR [
c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
[ 209.589076] Call Trace:
[ 209.589097] fib6_rule_lookup+0x50/0xb0
[ 209.589106] rt6_lookup+0xc4/0x110
[ 209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
[ 209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
[ 209.589134] rtnl_newlink+0x798/0xb80
[ 209.589142] rtnetlink_rcv_msg+0xec/0x390
[ 209.589151] netlink_rcv_skb+0x138/0x150
[ 209.589159] rtnetlink_rcv+0x48/0x70
[ 209.589169] netlink_unicast+0x538/0x640
[ 209.589175] netlink_sendmsg+0x40c/0x480
[ 209.589184] ___sys_sendmsg+0x384/0x4e0
[ 209.589194] SyS_sendmsg+0xd4/0x140
[ 209.589201] SyS_socketcall+0x3e0/0x4f0
[ 209.589209] system_call+0x38/0xe0
This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
disabled from cmdline.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fahad Kunnathadi [Fri, 15 Sep 2017 06:31:58 +0000 (12:01 +0530)]
net: phy: Fix mask value write on gmii2rgmii converter speed register
To clear Speed Selection in MDIO control register(0x10),
ie, clear bits 6 and 13 to zero while keeping other bits same.
Before AND operation,The Mask value has to be perform with bitwise NOT
operation (ie, ~ operator)
This patch clears current speed selection before writing the
new speed settings to gmii2rgmii converter
Fixes:
f411a6160bd4 ("net: phy: Add gmiitorgmii converter support")
Signed-off-by: Fahad Kunnathadi <fahad.kunnathadi@dexceldesigns.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 15 Sep 2017 04:00:07 +0000 (12:00 +0800)]
ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
Now in ip6gre_header before packing the ipv6 header, it skb_push t->hlen
which only includes encap_hlen + tun_hlen. It means greh and inner header
would be over written by ipv6 stuff and ipv6h might have no chance to set
up.
Jianlin found this issue when using remote any on ip6_gre, the packets he
captured on gre dev are truncated:
22:50:26.210866 Out ethertype IPv6 (0x86dd), length 120: truncated-ip6 -\
8128 bytes missing!(flowlabel 0x92f40, hlim 0, next-header Options (0) \
payload length: 8192) ::1:2000:0 > ::1:0:86dd: HBH [trunc] ip-proto-128 \
8184
It should also skb_push ipv6hdr so that ipv6h points to the right position
to set ipv6 stuff up.
This patch is to skb_push hlen + sizeof(*ipv6h) and also fix some indents
in ip6gre_header.
Fixes:
c12b395a4664 ("gre: Support GRE over IPv6")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 18 Sep 2017 20:46:36 +0000 (22:46 +0200)]
nl80211: fix null-ptr dereference on invalid mesh configuration
If TX rates are specified during mesh join, the channel must
also be specified. Check the channel pointer to avoid a null
pointer dereference if it isn't.
Reported-by: Jouni Malinen <j@w1.fi>
Fixes:
8564e38206de ("cfg80211: add checks for beacon rate, extend to mesh")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Subash Abhinov Kasiviswanathan [Thu, 14 Sep 2017 01:30:51 +0000 (19:30 -0600)]
udpv6: Fix the checksum computation when HW checksum does not apply
While trying an ESP transport mode encryption for UDPv6 packets of
datagram size 1436 with MTU 1500, checksum error was observed in
the secondary fragment.
This error occurs due to the UDP payload checksum being missed out
when computing the full checksum for these packets in
udp6_hwcsum_outgoing().
Fixes:
d39d938c8228 ("ipv6: Introduce udpv6_send_skb()")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 16 Sep 2017 22:47:51 +0000 (15:47 -0700)]
Linux 4.14-rc1
Linus Torvalds [Sat, 16 Sep 2017 19:08:10 +0000 (12:08 -0700)]
Merge tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI updates from Richard Weinberger:
"Minor improvements"
* tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs:
UBI: Fix two typos in comments
ubi: fastmap: fix spelling mistake: "invalidiate" -> "invalidate"
ubi: pr_err() strings should end with newlines
ubi: pr_err() strings should end with newlines
ubi: pr_err() strings should end with newlines
Linus Torvalds [Sat, 16 Sep 2017 19:03:25 +0000 (12:03 -0700)]
Merge branch 'for-linus-4.14-rc1' of git://git./linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- minor improvements
- fixes for Debian's new gcc defaults (pie enabled by default)
- fixes for XSTATE/XSAVE to make UML work again on modern systems
* 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: return negative in tuntap_open_tramp()
um: remove a stray tab
um: Use relative modversions with LD_SCRIPT_DYN
um: link vmlinux with -no-pie
um: Fix CONFIG_GCOV for modules.
Fix minor typos and grammar in UML start_up help
um: defconfig: Cleanup from old Kconfig options
um: Fix FP register size for XSTATE/XSAVE
Linus Torvalds [Sat, 16 Sep 2017 18:28:59 +0000 (11:28 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix hotplug deadlock in hv_netvsc, from Stephen Hemminger.
2) Fix double-free in rmnet driver, from Dan Carpenter.
3) INET connection socket layer can double put request sockets, fix
from Eric Dumazet.
4) Don't match collect metadata-mode tunnels if the device is down,
from Haishuang Yan.
5) Do not perform TSO6/GSO on ipv6 packets with extensions headers in
be2net driver, from Suresh Reddy.
6) Fix scaling error in gen_estimator, from Eric Dumazet.
7) Fix 64-bit statistics deadlock in systemport driver, from Florian
Fainelli.
8) Fix use-after-free in sctp_sock_dump, from Xin Long.
9) Reject invalid BPF_END instructions in verifier, from Edward Cree.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
Documentation: link in networking docs
tcp: fix data delivery rate
bpf/verifier: reject BPF_ALU64|BPF_END
sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
sctp: fix an use-after-free issue in sctp_sock_dump
netvsc: increase default receive buffer size
tcp: update skb->skb_mstamp more carefully
net: ipv4: fix l3slave check for index returned in IP_PKTINFO
net: smsc911x: Quieten netif during suspend
net: systemport: Fix 64-bit stats deadlock
net: vrf: avoid gcc-4.6 warning
qed: remove unnecessary call to memset
tg3: clean up redundant initialization of tnapi
tls: make tls_sw_free_resources static
sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
MAINTAINERS: review Renesas DT bindings as well
net_sched: gen_estimator: fix scaling error in bytes/packets samples
nfp: wait for the NSP resource to appear on boot
nfp: wait for board state before talking to the NSP
...
Linus Torvalds [Sat, 16 Sep 2017 18:24:26 +0000 (11:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull more input updates from Dmitry Torokhov:
"A second round of updates for the input subsystem:
- a new driver for PWM-controlled vibrators
- ucb1400 touchscreen driver had completely busted suspend/resume
handling
- we now handle "home" button found on some devices with Goodix
touchscreens
- assorted other fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Gigabyte P57 to the keyboard reset table
Input: xpad - validate USB endpoint type during probe
Input: ucb1400_ts - fix suspend and resume handling
Input: edt-ft5x06 - fix access to non-existing register
Input: elantech - make arrays debounce_packet static, reduces object code size
Input: surface3_spi - make const array header static, reduces object code size
Input: goodix - add support for capacitive home button
Input: add a driver for PWM controllable vibrators
Input: adi - make array seq static, reduces object code size
Markus Trippelsdorf [Sat, 16 Sep 2017 09:01:16 +0000 (11:01 +0200)]
firmware: Restore support for built-in firmware
Commit
5620a0d1aac ("firmware: delete in-kernel firmware") removed the
entire firmware directory. Unfortunately it thereby also removed the
support for built-in firmware.
This restores the ability to build firmware directly into the kernel by
pruning the original Makefile to the necessary minimum. The default for
EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/.
Fixes:
5620a0d1aac ("firmware: delete in-kernel firmware")
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Greg K-H <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ido Schimmel [Fri, 15 Sep 2017 13:31:07 +0000 (15:31 +0200)]
mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.
This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.
Fixes:
65e65ec137f4 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Andreas Rammhold <andreas@rammhold.de>
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Machek [Sat, 16 Sep 2017 14:28:02 +0000 (16:28 +0200)]
Documentation: link in networking docs
Fix link in filter.txt.
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 15 Sep 2017 23:47:42 +0000 (16:47 -0700)]
tcp: fix data delivery rate
Now skb->mstamp_skb is updated later, we also need to call
tcp_rate_skb_sent() after the update is done.
Fixes:
8c72c65b426b ("tcp: update skb->skb_mstamp more carefully")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 16 Sep 2017 03:43:33 +0000 (20:43 -0700)]
Merge branch '4.14-features' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for 4.14 for MIPS; below a summary of
the non-merge commits:
CM:
- Rename mips_cm_base to mips_gcr_base
- Specify register size when generating accessors
- Use BIT/GENMASK for register fields, order & drop shifts
- Add cluster & block args to mips_cm_lock_other()
CPC:
- Use common CPS accessor generation macros
- Use BIT/GENMASK for register fields, order & drop shifts
- Introduce register modify (set/clear/change) accessors
- Use change_*, set_* & clear_* where appropriate
- Add CM/CPC 3.5 register definitions
- Use GlobalNumber macros rather than magic numbers
- Have asm/mips-cps.h include CM & CPC headers
- Cluster support for topology functions
- Detect CPUs in secondary clusters
CPS:
- Read GIC_VL_IDENT directly, not via irqchip driver
DMA:
- Consolidate coherent and non-coherent dma_alloc code
- Don't use dma_cache_sync to implement fd_cacheflush
FPU emulation / FP assist code:
- Another series of 14 commits fixing corner cases such as NaN
propgagation and other special input values.
- Zero bits 32-63 of the result for a CLASS.D instruction.
- Enhanced statics via debugfs
- Do not use bools for arithmetic. GCC 7.1 moans about this.
- Correct user fault_addr type
Generic MIPS:
- Enhancement of stack backtraces
- Cleanup from non-existing options
- Handle non word sized instructions when examining frame
- Fix detection and decoding of ADDIUSP instruction
- Fix decoding of SWSP16 instruction
- Refactor handling of stack pointer in get_frame_info
- Remove unreachable code from force_fcr31_sig()
- Convert to using %pOF instead of full_name
- Remove the R6000 support.
- Move FP code from *_switch.S to *_fpu.S
- Remove unused ST_OFF from r2300_switch.S
- Allow platform to specify multiple its.S files
- Add #includes to various files to ensure code builds reliable and
without warning..
- Remove __invalidate_kernel_vmap_range
- Remove plat_timer_setup
- Declare various variables & functions static
- Abstract CPU core & VP(E) ID access through accessor functions
- Store core & VP IDs in GlobalNumber-style variable
- Unify checks for sibling CPUs
- Add CPU cluster number accessors
- Prevent direct use of generic_defconfig
- Make CONFIG_MIPS_MT_SMP default y
- Add __ioread64_copy
- Remove unnecessary inclusions of linux/irqchip/mips-gic.h
GIC:
- Introduce asm/mips-gic.h with accessor functions
- Use new GIC accessor functions in mips-gic-timer
- Remove counter access functions from irq-mips-gic.c
- Remove gic_read_local_vp_id() from irq-mips-gic.c
- Simplify shared interrupt pending/mask reads in irq-mips-gic.c
- Simplify gic_local_irq_domain_map() in irq-mips-gic.c
- Drop gic_(re)set_mask() functions in irq-mips-gic.c
- Remove gic_set_polarity(), gic_set_trigger(), gic_set_dual_edge(),
gic_map_to_pin() and gic_map_to_vpe() from irq-mips-gic.c.
- Convert remaining shared reg access, local int mask access and
remaining local reg access to new accessors
- Move GIC_LOCAL_INT_* to asm/mips-gic.h
- Remove GIC_CPU_INT* macros from irq-mips-gic.c
- Move various definitions to the driver
- Remove gic_get_usm_range()
- Remove __gic_irq_dispatch() forward declaration
- Remove gic_init()
- Use mips_gic_present() in place of gic_present and remove
gic_present
- Move gic_get_c0_*_int() to asm/mips-gic.h
- Remove linux/irqchip/mips-gic.h
- Inline __gic_init()
- Inline gic_basic_init()
- Make pcpu_masks a per-cpu variable
- Use pcpu_masks to avoid reading GIC_SH_MASK*
- Clean up mti, reserved-cpu-vectors handling
- Use cpumask_first_and() in gic_set_affinity()
- Let the core set struct irq_common_data affinity
microMIPS:
- Fix microMIPS stack unwinding on big endian systems
MIPS-GIC:
- SYNC after enabling GIC region
NUMA:
- Remove the unused parent_node() macro
R6:
- Constify r2_decoder_tables
- Add accessor & bit definitions for GlobalNumber
SMP:
- Constify smp ops
- Allow boot_secondary SMP op to return errors
VDSO:
- Drop gic_get_usm_range() usage
- Avoid use of linux/irqchip/mips-gic.h
Platform changes:
Alchemy:
- Add devboard machine type to cpuinfo
- update cpu feature overrides
- Threaded carddetect irqs for devboards
AR7:
- allow NULL clock for clk_get_rate
BCM63xx:
- Fix ENETDMA_6345_MAXBURST_REG offset
- Allow NULL clock for clk_get_rate
CI20:
- Enable GPIO and RTC drivers in defconfig
- Add ethernet and fixed-regulator nodes to DTS
Generic platform:
- Move Boston and NI 169445 FIT image source to their own files
- Include asm/bootinfo.h for plat_fdt_relocated()
- Include asm/time.h for get_c0_*_int()
- Include asm/bootinfo.h for plat_fdt_relocated()
- Include asm/time.h for get_c0_*_int()
- Allow filtering enabled boards by requirements
- Don't explicitly disable CONFIG_USB_SUPPORT
- Bump default NR_CPUS to 16
JZ4700:
- Probe the jz4740-rtc driver from devicetree
Lantiq:
- Drop check of boot select from the spi-falcon driver.
- Drop check of boot select from the lantiq-flash MTD driver.
- Access boot cause register in the watchdog driver through regmap
- Add device tree binding documentation for the watchdog driver
- Add docs for the RCU DT bindings.
- Convert the fpi bus driver to a platform_driver
- Remove ltq_reset_cause() and ltq_boot_select(
- Switch to a proper reset driver
- Switch to a new drivers/soc GPHY driver
- Add an USB PHY driver for the Lantiq SoCs using the RCU module
- Use of_platform_default_populate instead of __dt_register_buses
- Enable MFD_SYSCON to be able to use it for the RCU MFD
- Replace ltq_boot_select() with dummy implementation.
Loongson 2F:
- Allow NULL clock for clk_get_rate
Malta:
- Use new GIC accessor functions
NI 169445:
- Add support for NI 169445 board.
- Only include in 32r2el kernels
Octeon:
- Add support for watchdog of 78XX SOCs.
- Add support for watchdog of CN68XX SOCs.
- Expose support for mips32r1, mips32r2 and mips64r1
- Enable more drivers in config file
- Add support for accessing the boot vector.
- Remove old boot vector code from watchdog driver
- Define watchdog registers for 70xx, 73xx, 78xx, F75xx.
- Make CSR functions node aware.
- Allow access to CIU3 IRQ domains.
- Misc cleanups in the watchdog driver
Omega2+:
- New board, add support and defconfig
Pistachio:
- Enable Root FS on NFS in defconfig
Ralink:
- Add Mediatek MT7628A SoC
- Allow NULL clock for clk_get_rate
- Explicitly request exclusive reset control in the pci-mt7620 PCI driver.
SEAD3:
- Only include in 32 bit kernels by default
VoCore:
- Add VoCore as a vendor t0 dt-bindings
- Add defconfig file"
* '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (167 commits)
MIPS: Refactor handling of stack pointer in get_frame_info
MIPS: Stacktrace: Fix microMIPS stack unwinding on big endian systems
MIPS: microMIPS: Fix decoding of swsp16 instruction
MIPS: microMIPS: Fix decoding of addiusp instruction
MIPS: microMIPS: Fix detection of addiusp instruction
MIPS: Handle non word sized instructions when examining frame
MIPS: ralink: allow NULL clock for clk_get_rate
MIPS: Loongson 2F: allow NULL clock for clk_get_rate
MIPS: BCM63XX: allow NULL clock for clk_get_rate
MIPS: AR7: allow NULL clock for clk_get_rate
MIPS: BCM63XX: fix ENETDMA_6345_MAXBURST_REG offset
mips: Save all registers when saving the frame
MIPS: Add DWARF unwinding to assembly
MIPS: Make SAVE_SOME more standard
MIPS: Fix issues in backtraces
MIPS: jz4780: DTS: Probe the jz4740-rtc driver from devicetree
MIPS: Ci20: Enable RTC driver
watchdog: octeon-wdt: Add support for 78XX SOCs.
watchdog: octeon-wdt: Add support for cn68XX SOCs.
watchdog: octeon-wdt: File cleaning.
...
Linus Torvalds [Sat, 16 Sep 2017 03:25:06 +0000 (20:25 -0700)]
Merge tag 'pci-v4.14-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"Revert an attempt to fix a race while enabling upstream bridges
because it broke iwlwifi firmware loading"
* tag 'pci-v4.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: Avoid race while enabling upstream bridges"
Linus Torvalds [Sat, 16 Sep 2017 00:52:52 +0000 (17:52 -0700)]
Merge tag 'drm-fixes-for-v4.14-rc1' of git://people.freedesktop.org/~airlied/linux
Pull drm AMD fixes from Dave Airlie:
"Just had a single AMD fixes pull from Alex for rc1"
* tag 'drm-fixes-for-v4.14-rc1' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
drm/amdgpu: remove duplicate return statement
drm/amdgpu: check memory allocation failure
drm/amd/amdgpu: fix BANK_SELECT on Vega10 (v2)
drm/amdgpu: inline amdgpu_ttm_do_bind again
drm/amdgpu: fix amdgpu_ttm_bind
drm/amdgpu: remove the GART copy hack
drm/ttm:fix wrong decoding of bo_count
drm/ttm: fix missing inc bo_count
drm/amdgpu: set sched_hw_submission higher for KIQ (v3)
drm/amdgpu: move default gart size setting into gmc modules
drm/amdgpu: refine default gart size
drm/amd/powerplay: ACG frequency added in PPTable
drm/amdgpu: discard commands of killed processes
drm/amdgpu: fix and cleanup shadow handling
drm/amdgpu: add automatic per asic settings for gart_size
drm/amdgpu/gfx8: fix spelling typo in mqd allocation
drm/amd/powerplay: unhalt mec after loading
drm/amdgpu/virtual_dce: Virtual display doesn't support disable vblank immediately
drm/amdgpu: Fix huge page updates with CPU
Linus Torvalds [Sat, 16 Sep 2017 00:49:46 +0000 (17:49 -0700)]
Merge branch 'i2c/for-next' of git://git./linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
"I2C has two more new drivers: Altera FPGA and STM32F7"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i2c-stm32f7: add driver
i2c: i2c-stm32f4: use generic definition of speed enum
dt-bindings: i2c-stm32: Document the STM32F7 I2C bindings
i2c: altera: Add Altera I2C Controller driver
dt-bindings: i2c: Add Altera I2C Controller
Linus Torvalds [Fri, 15 Sep 2017 22:43:55 +0000 (15:43 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
- PPC bugfixes
- RCU splat fix
- swait races fix
- pointless userspace-triggerable BUG() fix
- misc fixes for KVM_RUN corner cases
- nested virt correctness fixes + one host DoS
- some cleanups
- clang build fix
- fix AMD AVIC with default QEMU command line options
- x86 bugfixes
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
kvm,mips: Fix potential swait_active() races
kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
kvm: Serialize wq active checks in kvm_vcpu_wake_up()
kvm,x86: Fix apf_task_wake_one() wq serialization
kvm,lapic: Justify use of swait_active()
kvm,async_pf: Use swq_has_sleeper()
sched/wait: Add swq_has_sleeper()
KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
kvm: nVMX: Don't allow L2 to access the hardware CR8
KVM: trace events: update list of exit reasons
KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
KVM: X86: Don't block vCPU if there is pending exception
KVM: SVM: Add irqchip_split() checks before enabling AVIC
KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
KVM: x86: fix clang build
...
Edward Cree [Fri, 15 Sep 2017 13:37:38 +0000 (14:37 +0100)]
bpf/verifier: reject BPF_ALU64|BPF_END
Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.
Fixes:
17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 15 Sep 2017 03:02:48 +0000 (11:02 +0800)]
sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
sctp_diag would not actually dump out sk/asoc if inet_sctp_diag_fill
returns err, in which case it shouldn't mark sk dumped by setting
cb->args[3] as 1 in sctp_sock_dump().
Otherwise, it could cause some asocs to have no parent's sk dumped
in 'ss --sctp'.
So this patch is to not set cb->args[3] when inet_sctp_diag_fill()
returns err in sctp_sock_dump().
Fixes:
8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 15 Sep 2017 03:02:21 +0000 (11:02 +0800)]
sctp: fix an use-after-free issue in sctp_sock_dump
Commit
86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the
dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep
with holding sock and sock lock.
But Paolo noticed that endpoint could be destroyed in sctp_rcv without
sock lock protection. It means the use-after-free issue still could be
triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks
!ep, although it's pretty hard to reproduce.
I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close
and sctp_sock_dump long time.
This patch is to add another param cb_done to sctp_for_each_transport
and dump ep->assocs with holding tsp after jumping out of transport's
traversal in it to avoid this issue.
It can also improve sctp diag dump to make it run faster, as no need
to save sk into cb->args[5] and keep calling sctp_for_each_transport
any more.
This patch is also to use int * instead of int for the pos argument
in sctp_for_each_transport, which could make postion increment only
in sctp_for_each_transport and no need to keep changing cb->args[2]
in sctp_sock_filter and sctp_sock_dump any more.
Fixes:
86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Thu, 14 Sep 2017 16:31:07 +0000 (09:31 -0700)]
netvsc: increase default receive buffer size
The default receive buffer size was reduced by recent change
to a value which was appropriate for 10G and Windows Server 2016.
But the value is too small for full performance with 40G on Azure.
Increase the default back to maximum supported by host.
Fixes:
8b5327975ae1 ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 14 Sep 2017 03:30:39 +0000 (20:30 -0700)]
tcp: update skb->skb_mstamp more carefully
liujian reported a problem in TCP_USER_TIMEOUT processing with a patch
in tcp_probe_timer() :
https://www.spinics.net/lists/netdev/msg454496.html
After investigations, the root cause of the problem is that we update
skb->skb_mstamp of skbs in write queue, even if the attempt to send a
clone or copy of it failed. One reason being a routing problem.
This patch prevents this, solving liujian issue.
It also removes a potential RTT miscalculation, since
__tcp_retransmit_skb() is not OR-ing TCP_SKB_CB(skb)->sacked with
TCPCB_EVER_RETRANS if a failure happens, but skb->skb_mstamp has
been changed.
A future ACK would then lead to a very small RTT sample and min_rtt
would then be lowered to this too small value.
Tested:
# cat user_timeout.pkt
--local_ip=192.168.102.64
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0`
+0 < S 0:0(0) win 0 <mss 1460>
+0 > S. 0:0(0) ack 1 <mss 1460>
+.1 < . 1:1(0) ack 1 win 65530
+0 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
+0 write(4, ..., 24) = 24
+0 > P. 1:25(24) ack 1 win 29200
+.1 < . 1:1(0) ack 25 win 65530
//change the ipaddress
+1 `ifconfig tun0 192.168.0.10/16`
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+0 `ifconfig tun0 192.168.102.64/16`
+0 < . 1:2(1) ack 25 win 65530
+0 `ifconfig tun0 192.168.0.10/16`
+3 write(4, ..., 24) = -1
# ./packetdrill user_timeout.pkt
Signed-off-by: Eric Dumazet <edumazet@googl.com>
Reported-by: liujian <liujian56@huawei.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 14 Sep 2017 00:11:37 +0000 (17:11 -0700)]
net: ipv4: fix l3slave check for index returned in IP_PKTINFO
rt_iif is only set to the actual egress device for the output path. The
recent change to consider the l3slave flag when returning IP_PKTINFO
works for local traffic (the correct device index is returned), but it
broke the more typical use case of packets received from a remote host
always returning the VRF index rather than the original ingress device.
Update the fixup to consider l3slave and rt_iif actually getting set.
Fixes:
1dfa76390bf05 ("net: ipv4: add check for l3slave for index returned in IP_PKTINFO")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Wed, 13 Sep 2017 17:42:05 +0000 (19:42 +0200)]
net: smsc911x: Quieten netif during suspend
If the network interface is kept running during suspend, the net core
may call net_device_ops.ndo_start_xmit() while the Ethernet device is
still suspended, which may lead to a system crash.
E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock. If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.
As this is a race condition with a small time window, it is not so easy
to trigger at will. Using pm_test may increase your chances:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test
# echo mem > /sys/power/state
To fix this, make sure the network interface is quietened during
suspend.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 12 Sep 2017 20:14:26 +0000 (13:14 -0700)]
net: systemport: Fix 64-bit stats deadlock
We can enter a deadlock situation because there is no sufficient protection
when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
contexts running in softirq, this can lead to the following lockdep splat and
actual deadlock was experienced as well with an iperf session in the background
and a while loop doing ifconfig + ethtool.
[ 5.780350] ================================
[ 5.784679] WARNING: inconsistent lock state
[ 5.789011] 4.13.0-rc7-02179-g32fae27c725d #70 Not tainted
[ 5.794561] --------------------------------
[ 5.798890] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 5.804971] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 5.810175] (&syncp->seq#2){+.?...}, at: [<
c0768a28>] bcm_sysport_tx_reclaim+0x30/0x54
[ 5.818327] {SOFTIRQ-ON-W} state was registered at:
[ 5.823278] bcm_sysport_get_stats64+0x17c/0x258
[ 5.828053] dev_get_stats+0x38/0xac
[ 5.831776] rtnl_fill_stats+0x30/0x118
[ 5.835761] rtnl_fill_ifinfo+0x538/0xe24
[ 5.839921] rtmsg_ifinfo_build_skb+0x6c/0xd8
[ 5.844430] rtmsg_ifinfo_event.part.5+0x14/0x44
[ 5.849201] rtmsg_ifinfo+0x20/0x28
[ 5.852837] register_netdevice+0x628/0x6b8
[ 5.857171] register_netdev+0x14/0x24
[ 5.861051] bcm_sysport_probe+0x30c/0x438
[ 5.865280] platform_drv_probe+0x50/0xb0
[ 5.869418] driver_probe_device+0x2e8/0x450
[ 5.873817] __driver_attach+0x104/0x120
[ 5.877871] bus_for_each_dev+0x7c/0xc0
[ 5.881834] bus_add_driver+0x1b0/0x270
[ 5.885797] driver_register+0x78/0xf4
[ 5.889675] do_one_initcall+0x54/0x190
[ 5.893646] kernel_init_freeable+0x144/0x1d0
[ 5.898135] kernel_init+0x8/0x110
[ 5.901665] ret_from_fork+0x14/0x2c
[ 5.905363] irq event stamp: 24263
[ 5.908804] hardirqs last enabled at (24262): [<
c08eecf0>] net_rx_action+0xc4/0x4e4
[ 5.916624] hardirqs last disabled at (24263): [<
c0a7da00>] _raw_spin_lock_irqsave+0x1c/0x98
[ 5.925143] softirqs last enabled at (24258): [<
c022a7fc>] irq_enter+0x84/0x98
[ 5.932524] softirqs last disabled at (24259): [<
c022a918>] irq_exit+0x108/0x16c
[ 5.939985]
[ 5.939985] other info that might help us debug this:
[ 5.946576] Possible unsafe locking scenario:
[ 5.946576]
[ 5.952556] CPU0
[ 5.955031] ----
[ 5.957506] lock(&syncp->seq#2);
[ 5.960955] <Interrupt>
[ 5.963604] lock(&syncp->seq#2);
[ 5.967227]
[ 5.967227] *** DEADLOCK ***
[ 5.967227]
[ 5.973222] 1 lock held by swapper/0/0:
[ 5.977092] #0: (&(&ring->lock)->rlock){..-...}, at: [<
c0768a18>] bcm_sysport_tx_reclaim+0x20/0x54
So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
since it does not appear to be useful for anything. No inconsistency was
observed with either ifconfig or ethtool, global TX counts equal the sum of
per-queue TX counts on a 32-bit architecture.
Fixes:
10377ba7673d ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 12 Sep 2017 20:10:53 +0000 (22:10 +0200)]
net: vrf: avoid gcc-4.6 warning
When building an allmodconfig kernel with gcc-4.6, we get a rather
odd warning:
drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’:
drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror]
drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror]
I have no idea what this warning is even trying to say, but it does
seem like a false positive. Reordering the initialization in to match
the structure definition gets rid of the warning, and might also avoid
whatever gcc thinks is wrong here.
Fixes:
9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Himanshu Jha [Tue, 12 Sep 2017 11:19:22 +0000 (16:49 +0530)]
qed: remove unnecessary call to memset
call to memset to assign 0 value immediately after allocating
memory with kzalloc is unnecesaary as kzalloc allocates the memory
filled with 0 value.
Semantic patch used to resolve this issue:
@@
expression e,e2; constant c;
statement S;
@@
e = kzalloc(e2, c);
if(e == NULL) S
- memset(e, 0, e2);
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Acked-by: Sudarsana Kalluru <sudarsana.kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 15 Sep 2017 19:58:55 +0000 (12:58 -0700)]
Merge tag 'firmware_removal-4.14-rc1' of git://git./linux/kernel/git/gregkh/driver-core
Pull firmware removal from Greg KH:
"Many many years ago (at the kernel summit in Boston), we all came to
the agreement that the firmware/ tree should be dropped from the
kernel, and everyone use the linux-firmware package instead. For some
minor reason, David Woodhouse didn't send the pull request at that
point in time, and everyone forgot about this.
The topic came up in the hallway track at the Plumbers conference this
week, so here's a single patch that drops the whole firmware tree. The
last firmware update was back in 2013, and all distros have been using
linux-firmware instead since at least that year, if not before. The
only commits to that directory since 2013 was some kbuild fixups for
various build tool issues.
So lets finally drop this, we don't need to lug them around in the
kernel source tree anymore, especially as no one wants or uses them.
This has passed build testing with 0-day, I don't think it made it
into linux-next this week, but I figured it was good to get in before
4.14-rc1 was out"
* tag 'firmware_removal-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware: delete in-kernel firmware
Linus Torvalds [Fri, 15 Sep 2017 19:47:21 +0000 (12:47 -0700)]
Merge tag 'nios2-v4.14-rc1' of git://git./linux/kernel/git/lftan/nios2
Pull arch/nios2 update from Ley Foon Tan.
* tag 'nios2-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: time: Read timer in get_cycles only if initialized
nios2: add earlycon support to 3c120 devboard DTS
Linus Torvalds [Fri, 15 Sep 2017 19:44:59 +0000 (12:44 -0700)]
Merge tag 'powerpc-4.14-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"Just one fix, for the handling of alignment interrupts on dcbz
instructions.
Thanks to Paul Mackerras, Christian Zigotzky, Michal Sojka"
* tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix handling of alignment interrupt on dcbz instruction
Linus Torvalds [Fri, 15 Sep 2017 19:16:18 +0000 (12:16 -0700)]
Merge tag 'for-linus-4.14-ofs2' of git://git./linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Some cleanups and a big bug fix for ACLs.
When I was reviewing Jan Kara's ACL patch, I realized that Orangefs
ACL code was busted, not just in the kernel module, but in the server
as well. I've been working on the code in the server mostly, but
here's one kernel patch, there will be more"
* tag 'for-linus-4.14-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Adjust three checks for null pointers
orangefs: Use kcalloc() in orangefs_prepare_cdm_array()
orangefs: Delete error messages for a failed memory allocation in five functions
orangefs: constify xattr_handler structure
orangefs: don't call filemap_write_and_wait from fsync
orangefs: off by ones in xattr size checks
orangefs: documentation clean up
orangefs: react properly to posix_acl_update_mode's aftermath.
orangefs: Don't clear SGID when inheriting ACLs
Dmitry Torokhov [Fri, 15 Sep 2017 16:52:21 +0000 (09:52 -0700)]
Merge branch 'next' into for-linus
Prepare second round of input updates for 4.14 merge window.
Kai-Heng Feng [Fri, 15 Sep 2017 16:36:16 +0000 (09:36 -0700)]
Input: i8042 - add Gigabyte P57 to the keyboard reset table
Similar to other Gigabyte laptops, the touchpad on P57 requires a
keyboard reset to detect Elantech touchpad correctly.
BugLink: https://bugs.launchpad.net/bugs/1594214
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Jim Mattson [Thu, 14 Sep 2017 23:31:44 +0000 (16:31 -0700)]
kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.
The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Thu, 14 Sep 2017 23:31:42 +0000 (16:31 -0700)]
kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.
On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Thu, 14 Sep 2017 23:31:40 +0000 (16:31 -0700)]
kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:24 +0000 (13:08 -0700)]
kvm,mips: Fix potential swait_active() races
For example, the following could occur, making us miss a wakeup:
CPU0 CPU1
kvm_vcpu_block kvm_mips_comparecount_func
[L] swait_active(&vcpu->wq)
[S] prepare_to_swait(&vcpu->wq)
[L] if (!kvm_vcpu_has_pending_timer(vcpu))
schedule() [S] queue_timer_int(vcpu)
Ensure that the swait_active() check is not hoisted over the interrupt.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:23 +0000 (13:08 -0700)]
kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
Particularly because kvmppc_fast_vcpu_kick_hv() is a callback,
ensure that we properly serialize wq active checks in order to
avoid potentially missing a wakeup due to racing with the waiter
side.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:22 +0000 (13:08 -0700)]
kvm: Serialize wq active checks in kvm_vcpu_wake_up()
This is a generic call and can be suceptible to races
in reading the wq task_list while another task is adding
itself to the list. Add a full barrier by using the
swq_has_sleeper() helper.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:21 +0000 (13:08 -0700)]
kvm,x86: Fix apf_task_wake_one() wq serialization
During code inspection, the following potential race was seen:
CPU0 CPU1
kvm_async_pf_task_wait apf_task_wake_one
[L] swait_active(&n->wq)
[S] prepare_to_swait(&n.wq)
[L] if (!hlist_unhahed(&n.link))
schedule() [S] hlist_del_init(&n->link);
Properly serialize swait_active() checks such that a wakeup is
not missed.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:20 +0000 (13:08 -0700)]
kvm,lapic: Justify use of swait_active()
A comment might serve future readers.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:19 +0000 (13:08 -0700)]
kvm,async_pf: Use swq_has_sleeper()
... as we've got the new helper now. This caller already
does the right thing, hence no changes in semantics.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Davidlohr Bueso [Wed, 13 Sep 2017 20:08:18 +0000 (13:08 -0700)]
sched/wait: Add swq_has_sleeper()
Which is the equivalent of what we have in regular waitqueues.
I'm not crazy about the name, but this also helps us get both
apis closer -- which iirc comes originally from the -net folks.
We also duplicate the comments for the lockless swait_active(),
from wait.h. Future users will make use of this interface.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jan H. Schönherr [Thu, 7 Sep 2017 18:02:30 +0000 (19:02 +0100)]
KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)
Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).
This fixes CVE-2017-1000252.
Fixes:
efc644048ecde54 ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jan H. Schönherr [Thu, 7 Sep 2017 18:02:48 +0000 (19:02 +0100)]
KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
We cannot add routes for gsi values >= KVM_MAX_IRQ_ROUTES -- see
kvm_set_irq_routing(). Hence, there is no sense in accepting them
via KVM_IRQFD. Prevent them from entering the system in the first
place.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guenter Roeck [Tue, 12 Sep 2017 03:45:26 +0000 (20:45 -0700)]
nios2: time: Read timer in get_cycles only if initialized
Mainline crashes as follows when running nios2 images.
On node 0 totalpages: 65536
free_area_init_node: node 0, pgdat
c8408fa0, node_mem_map
c8726000
Normal zone: 512 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 65536 pages, LIFO batch:15
Unable to handle kernel NULL pointer dereference at virtual address
00000000
ea =
c8003cb0, ra =
c81cbf40, cause = 15
Kernel panic - not syncing: Oops
Problem is seen because get_cycles() is called before the timer it depends
on is initialized. Returning 0 in that situation fixes the problem.
Fixes:
33d72f3822d7 ("init/main.c: extract early boot entropy from the ..")
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tobias Klauser [Tue, 20 Jun 2017 07:43:18 +0000 (09:43 +0200)]
nios2: add earlycon support to 3c120 devboard DTS
Allow earlycon to be used on the JTAG UART present in the 3c120 GHRD.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Jim Mattson [Tue, 12 Sep 2017 20:02:54 +0000 (13:02 -0700)]
kvm: nVMX: Don't allow L2 to access the hardware CR8
If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.
This fixes CVE-2017-12154.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vladis Dronov [Tue, 12 Sep 2017 22:21:21 +0000 (00:21 +0200)]
nl80211: check for the required netlink attributes presence
nl80211_set_rekey_data() does not check if the required attributes
NL80211_REKEY_DATA_{REPLAY_CTR,KEK,KCK} are present when processing
NL80211_CMD_SET_REKEY_OFFLOAD request. This request can be issued by
users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attributes presence.
This patch is based on the patch by bo Zhang.
This fixes CVE-2017-12153.
References: https://bugzilla.redhat.com/show_bug.cgi?id=1491046
Fixes:
e5497d766ad ("cfg80211/nl80211: support GTK rekey offload")
Cc: <stable@vger.kernel.org> # v3.1-rc1
Reported-by: bo Zhang <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bjorn Helgaas [Fri, 15 Sep 2017 06:33:51 +0000 (01:33 -0500)]
Revert "PCI: Avoid race while enabling upstream bridges"
This reverts commit
40f11adc7cd9281227f0a6a627d966dd0a5f0cd9.
Jens found that iwlwifi firmware loading failed on a Lenovo X1 Carbon,
gen4:
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-34.ucode failed with error -2
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-33.ucode failed with error -2
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-32.ucode failed with error -2
iwlwifi 0000:04:00.0: loaded firmware version 31.532993.0 op_mode iwlmvm
iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208
...
iwlwifi 0000:04:00.0: Failed to load firmware chunk!
iwlwifi 0000:04:00.0: Could not load the [0] uCode section
iwlwifi 0000:04:00.0: Failed to start INIT ucode: -110
iwlwifi 0000:04:00.0: Failed to run INIT ucode: -110
He bisected it to
40f11adc7cd9 ("PCI: Avoid race while enabling upstream
bridges"). Revert that commit to fix the regression.
Link: http://lkml.kernel.org/r/4bcbcbc1-7c79-09f0-5071-bc2f53bf6574@kernel.dk
Fixes:
40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Srinath Mannam <srinath.mannam@broadcom.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Luca Coelho <luca@coelho.fi>
CC: Johannes Berg <johannes@sipsolutions.net>
CC: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Mimi Zohar [Wed, 13 Sep 2017 02:45:33 +0000 (22:45 -0400)]
vfs: constify path argument to kernel_read_file_from_path
This patch constifies the path argument to kernel_read_file_from_path().
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 15 Sep 2017 03:04:32 +0000 (20:04 -0700)]
Merge tag 'nfs-for-4.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull more NFS client updates from Trond Myklebust:
"Hightlights include:
Bugfixes:
- Various changes relating to reporting IO errors.
- pnfs: Use the standard I/O stateid when calling LAYOUTGET
Features:
- Add static NFS I/O tracepoints for debugging"
* tag 'nfs-for-4.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: various changes relating to reporting IO errors.
NFS: Add static NFS I/O tracepoints
pNFS: Use the standard I/O stateid when calling LAYOUTGET
Linus Torvalds [Fri, 15 Sep 2017 03:01:41 +0000 (20:01 -0700)]
Merge branch 'work.misc' of git://git./linux/kernel/git/viro/vfs
Pull misc leftovers from Al Viro.
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix the __user misannotations in asm-generic get_user/put_user
fput: Don't reinvent the wheel but use existing llist API
namespace.c: Don't reinvent the wheel but use existing llist API
Linus Torvalds [Fri, 15 Sep 2017 02:29:55 +0000 (19:29 -0700)]
Merge branch 'work.read_write' of git://git./linux/kernel/git/viro/vfs
Pull nowait read support from Al Viro:
"Support IOCB_NOWAIT for buffered reads and block devices"
* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
block_dev: support RFW_NOWAIT on block device nodes
fs: support RWF_NOWAIT for buffered reads
fs: support IOCB_NOWAIT in generic_file_buffered_read
fs: pass iocb to do_generic_file_read
Linus Torvalds [Fri, 15 Sep 2017 01:54:01 +0000 (18:54 -0700)]
Merge branch 'work.mount' of git://git./linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).
This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like
list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')
sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
-e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
-e 's/\<MS_NODEV\>/SB_NODEV/g' \
-e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
-e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
-e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
-e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
-e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
-e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
-e 's/\<MS_SILENT\>/SB_SILENT/g' \
-e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
-e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
-e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
-e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
$list
and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
Linus Torvalds [Fri, 15 Sep 2017 01:13:32 +0000 (18:13 -0700)]
Merge branch 'work.set_fs' of git://git./linux/kernel/git/viro/vfs
Pull more set_fs removal from Al Viro:
"Christoph's 'use kernel_read and friends rather than open-coding
set_fs()' series"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: unexport vfs_readv and vfs_writev
fs: unexport vfs_read and vfs_write
fs: unexport __vfs_read/__vfs_write
lustre: switch to kernel_write
gadget/f_mass_storage: stop messing with the address limit
mconsole: switch to kernel_read
btrfs: switch write_buf to kernel_write
net/9p: switch p9_fd_read to kernel_write
mm/nommu: switch do_mmap_private to kernel_read
serial2002: switch serial2002_tty_write to kernel_{read/write}
fs: make the buf argument to __kernel_write a void pointer
fs: fix kernel_write prototype
fs: fix kernel_read prototype
fs: move kernel_read to fs/read_write.c
fs: move kernel_write to fs/read_write.c
autofs4: switch autofs4_write to __kernel_write
ashmem: switch to ->read_iter
Linus Torvalds [Fri, 15 Sep 2017 00:37:26 +0000 (17:37 -0700)]
Merge branch 'work.ipc' of git://git./linux/kernel/git/viro/vfs
Pull ipc compat cleanup and 64-bit time_t from Al Viro:
"IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
Dinamani"
* 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
utimes: Make utimes y2038 safe
ipc: shm: Make shmid_kernel timestamps y2038 safe
ipc: sem: Make sem_array timestamps y2038 safe
ipc: msg: Make msg_queue timestamps y2038 safe
ipc: mqueue: Replace timespec with timespec64
ipc: Make sys_semtimedop() y2038 safe
get rid of SYSVIPC_COMPAT on ia64
semtimedop(): move compat to native
shmat(2): move compat to native
msgrcv(2), msgsnd(2): move compat to native
ipc(2): move compat to native
ipc: make use of compat ipc_perm helpers
semctl(): move compat to native
semctl(): separate all layout-dependent copyin/copyout
msgctl(): move compat to native
msgctl(): split the actual work from copyin/copyout
ipc: move compat shmctl to native
shmctl: split the work from copyin/copyout
Linus Torvalds [Fri, 15 Sep 2017 00:30:49 +0000 (17:30 -0700)]
Merge branch 'zstd-minimal' of git://git./linux/kernel/git/mason/linux-btrfs
Pull zstd support from Chris Mason:
"Nick Terrell's patch series to add zstd support to the kernel has been
floating around for a while. After talking with Dave Sterba, Herbert
and Phillip, we decided to send the whole thing in as one pull
request.
zstd is a big win in speed over zlib and in compression ratio over
lzo, and the compression team here at FB has gotten great results
using it in production. Nick will continue to update the kernel side
with new improvements from the open source zstd userland code.
Nick has a number of benchmarks for the main zstd code in his lib/zstd
commit:
I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB
of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel
Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using
`silesia.tar` [3], which is 211,988,480 B large. Run the following
commands for the benchmark:
sudo modprobe zstd_compress_test
sudo mknod zstd_compress_test c 245 0
sudo cp silesia.tar zstd_compress_test
The time is reported by the time of the userland `cp`.
The MB/s is computed with
1,536,217,008 B / time(buffer size, hash)
which includes the time to copy from userland.
The Adjusted MB/s is computed with
1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
The memory reported is the amount of memory the compressor
requests.
| Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) |
|----------|----------|----------|-------|---------|----------|----------|
| none |
11988480 | 0.100 | 1 | 2119.88 | - | - |
| zstd -1 |
73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 |
| zstd -3 |
66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 |
| zstd -5 |
65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 |
| zstd -10 |
60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 |
| zstd -15 |
58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 |
| zstd -19 |
54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 |
| zlib -1 |
77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 |
| zlib -3 |
72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 |
| zlib -6 |
68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 |
| zlib -9 |
67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 |
I benchmarked zstd decompression using the same method on the same
machine. The benchmark file is located in the upstream zstd repo
under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The
memory reported is the amount of memory required to decompress
data compressed with the given compression level. If you know the
maximum size of your input, you can reduce the memory usage of
decompression irrespective of the compression level.
| Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) |
|----------|----------|---------|---------------|-------------|
| none | 0.025 | 8479.54 | - | - |
| zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 |
| zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 |
| zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 |
| zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 |
| zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 |
| zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 |
| zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 |
| zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 |
| zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 |
| zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 |
I ran a long series of tests and benchmarks on the btrfs side and the
gains are very similar to the core benchmarks Nick ran"
* 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
squashfs: Add zstd support
btrfs: Add zstd support
lib: Add zstd modules
lib: Add xxhash module
Paul Mackerras [Wed, 13 Sep 2017 04:51:24 +0000 (14:51 +1000)]
powerpc: Fix handling of alignment interrupt on dcbz instruction
This fixes the emulation of the dcbz instruction in the alignment
interrupt handler. The error was that we were comparing just the
instruction type field of op.type rather than the whole thing,
and therefore the comparison "type != CACHEOP + DCBZ" was always
true.
Fixes:
31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Tested-by: Michal Sojka <sojkam1@fel.cvut.cz>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Greg Kroah-Hartman [Thu, 14 Sep 2017 21:23:01 +0000 (14:23 -0700)]
firmware: delete in-kernel firmware
The last firmware change for the in-kernel firmware source code was back
in 2013. Everyone has been relying on the out-of-tree linux-firmware
package for a long long time.
So let's drop it, it's baggage we don't need to keep dragging around
(and having to fix random kbuild issues over time...)
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Thu, 14 Sep 2017 20:46:33 +0000 (13:46 -0700)]
Merge tag 'kbuild-v4.14' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Use Make-builtin $(abspath ...) helper to get absolute path
- Add W=2 extra warning option to detect unused macros
- Use more KCONFIG_CONFIG instead hard-coded .config
- Fix bugs of tar*-pkg targets
* tag 'kbuild-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: buildtar: do not print successful message if tar returns error
kbuild: buildtar: fix tar error when CONFIG_MODULES is disabled
kbuild: Use KCONFIG_CONFIG in buildtar
Kbuild: enable -Wunused-macros warning for "make W=2"
kbuild: use $(abspath ...) instead of $(shell cd ... && /bin/pwd)
Linus Torvalds [Thu, 14 Sep 2017 20:43:16 +0000 (13:43 -0700)]
Merge tag 'for-4.14/dm-changes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Some request-based DM core and DM multipath fixes and cleanups
- Constify a few variables in DM core and DM integrity
- Add bufio optimization and checksum failure accounting to DM
integrity
- Fix DM integrity to avoid checking integrity of failed reads
- Fix DM integrity to use init_completion
- A couple DM log-writes target fixes
- Simplify DAX flushing by eliminating the unnecessary flush
abstraction that was stood up for DM's use.
* tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dax: remove the pmem_dax_ops->flush abstraction
dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK
dm integrity: make blk_integrity_profile structure const
dm integrity: do not check integrity for failed read operations
dm log writes: fix >512b sectorsize support
dm log writes: don't use all the cpu while waiting to log blocks
dm ioctl: constify ioctl lookup table
dm: constify argument arrays
dm integrity: count and display checksum failures
dm integrity: optimize writing dm-bufio buffers that are partially changed
dm rq: do not update rq partially in each ending bio
dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior
dm mpath: complain about unsupported __multipath_map_bio() return values
dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
Linus Torvalds [Thu, 14 Sep 2017 20:33:33 +0000 (13:33 -0700)]
Merge tag 'fbdev-v4.14' of git://github.com/bzolnier/linux
Pull fbdev updates from Bartlomiej Zolnierkiewicz:
- make fbcon a built-time depency for fbdev (fbcon was tristate option
before, now it is a bool) - this is a first step in preparations for
making console_lock usage saner (currently it acts like the BKL for
all things fbdev/fbcon) (Daniel Vetter)
- add fbcon=margin:<color> command line option to select the fbcon
margin color (David Lechner)
- add DMI quirk table for x86 systems which need fbcon rotation
(devices like Asus T100HA, GPD Pocket, the GPD win and the I.T.Works
TW891) (Hans de Goede)
- fix 1bpp logo support for unusual width (needed by LEGO MINDSTORMS
EV3) (David Lechner)
- enable Xilinx FB driver for ARM ZynqMP platform (Michal Simek)
- fix use after free in the error path of udlfb driver (Anton Vasilyev)
- fix error return code handling in pxa3xx_gcu driver (Gustavo A. R.
Silva)
- fix bootparams.screeninfo arguments checking in vgacon (Jan H.
Schönherr)
- do not leak uninitialized padding in clk to userspace in the debug
code of atyfb driver (Vladis Dronov)
- fix compiler warnings in fbcon code and matroxfb driver (Arnd
Bergmann)
- convert fbdev susbsytem to using %pOF instead of full_name (Rob
Herring)
- structures constifications (Arvind Yadav, Bhumika Goyal, Gustavo A.
R. Silva, Julia Lawall)
- misc cleanups (Gustavo A. R. Silva, Hyun Kwon, Julia Lawall, Kuninori
Morimoto, Lynn Lei)
* tag 'fbdev-v4.14' of git://github.com/bzolnier/linux: (75 commits)
video/console: Update BIOS dates list for GPD win console rotation DMI quirk
video/console: Add rotated LCD-panel DMI quirk for the VIOS LTH17
video: fbdev: sis: fix duplicated code for different branches
video: fbdev: make fb_var_screeninfo const
video: fbdev: aty: do not leak uninitialized padding in clk to userspace
vgacon: Prevent faulty bootparams.screeninfo from causing harm
video: fbdev: make fb_videomode const
video/console: Add new BIOS date for GPD pocket to dmi quirk table
fbcon: remove restriction on margin color
video: ARM CLCD: constify amba_id
video: fm2fb: constify zorro_device_id
video: fbdev: annotate fb_fix_screeninfo with const and __initconst
omapfb: constify omap_video_timings structures
video: fbdev: udlfb: Fix use after free on dlfb_usb_probe error path
fbdev: i810: make fb_ops const
fbdev: matrox: make fb_ops const
video: fbdev: pxa3xx_gcu: fix error return code in pxa3xx_gcu_probe()
video: fbdev: Enable Xilinx FB for ZynqMP
video: fbdev: Fix multiple style issues in xilinxfb
video: fbdev: udlfb: constify usb_device_id.
...
Linus Torvalds [Thu, 14 Sep 2017 20:28:30 +0000 (13:28 -0700)]
Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add support for the watchdog on Meson8 and Meson8m2
- add support for MediaTek MT7623 and MT7622 SoC
- add support for the r8a77995 wdt
- explicitly request exclusive reset control for asm9260_wdt,
zx2967_wdt, rt2880_wdt and mt7621_wdt
- improvements to asm9260_wdt, aspeed_wdt, renesas_wdt and cadence_wdt
- add support for reading freq via CCF + suspend/resume support for
of_xilinx_wdt
- constify watchdog_ops and various device-id structures
- revert of commit
1fccb73011ea ("iTCO_wdt: all versions count down
twice") (Bug 196509)
* git://www.linux-watchdog.org/linux-watchdog: (40 commits)
watchdog: mei_wdt: constify mei_cl_device_id
watchdog: sp805: constify amba_id
watchdog: ziirave: constify i2c_device_id
watchdog: sc1200: constify pnp_device_id
dt-bindings: watchdog: renesas-wdt: Add support for the r8a77995 wdt
watchdog: renesas_wdt: update copyright dates
watchdog: renesas_wdt: make 'clk' a variable local to probe()
watchdog: renesas_wdt: consistently use RuntimePM for clock management
watchdog: aspeed: Support configuration of external signal properties
dt-bindings: watchdog: aspeed: External reset signal properties
drivers/watchdog: Add optional ASPEED device tree properties
drivers/watchdog: ASPEED reference dev tree properties for config
watchdog: da9063_wdt: Simplify by removing unneeded struct...
watchdog: bcm7038: Check the return value from clk_prepare_enable()
watchdog: qcom: Check for platform_get_resource() failure
watchdog: of_xilinx_wdt: Add suspend/resume support
watchdog: of_xilinx_wdt: Add support for reading freq via CCF
dt-bindings: watchdog: mediatek: add support for MediaTek MT7623 and MT7622 SoC
watchdog: max77620_wdt: constify platform_device_id
watchdog: pcwd_usb: constify usb_device_id
...
Linus Torvalds [Thu, 14 Sep 2017 20:10:48 +0000 (13:10 -0700)]
Merge branch 'dmi-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull dmi update from Jean Delvare:
"Mark all struct dmi_system_id instances const"
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
dmi: Mark all struct dmi_system_id instances const
Linus Torvalds [Thu, 14 Sep 2017 20:01:09 +0000 (13:01 -0700)]
Merge tag 'pinctrl-v4.14-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"This slew of fixes for pin control was noticed and patched up early,
so to get the annoyance out of the way for -rc1 it would make sense to
send them already.
- Fix a build include in the Uniphier driver to keep pace with
ongoing refactorings.
- Fix a slew of minor semantic and syntactic issues as well as
stricting up Kconfig for the new Spreadtrum driver.
- Fix the GPIO interrupt set-up on the Marvell 37xx Armada as fallout
for dynamically allocating irq descriptors from the core. (Also
tagged for stable.)
- Fix AMD register suspend/resume state spool/unspooling so that
wakeup works as it should. (Also tagged for stable.)"
* tag 'pinctrl-v4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl/amd: save pin registers over suspend/resume
pinctrl: armada-37xx: Fix gpio interrupt setup
pinctrl: sprd: fix off by one bugs
pinctrl: sprd: check for allocation failure
pinctrl: sprd: Restrict PINCTRL_SPRD to ARCH_SPRD or COMPILE_TEST
pinctrl: sprd: fix build errors and dependencies
pinctrl: sprd: make three local functions static
pinctrl: uniphier: include <linux/build_bug.h> instead of <linux/bug.h>
Linus Torvalds [Thu, 14 Sep 2017 19:25:34 +0000 (12:25 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"A few leftovers"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, page_owner: skip unnecessary stack_trace entries
arm64: stacktrace: avoid listing stacktrace functions in stacktrace
mm: treewide: remove GFP_TEMPORARY allocation flag
IB/mlx4: fix sprintf format warning
fscache: fix fscache_objlist_show format processing
lib/test_bitmap.c: use ULL suffix for 64-bit constants
procfs: remove unused variable
drivers/media/cec/cec-adap.c: fix build with gcc-4.4.4
idr: remove WARN_ON_ONCE() when trying to replace negative ID
Markus Elfring [Thu, 17 Aug 2017 19:35:16 +0000 (21:35 +0200)]
orangefs: Adjust three checks for null pointers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The script “checkpatch.pl” pointed information out like the following.
Comparison to NULL could be written !…
Thus fix affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Markus Elfring [Thu, 17 Aug 2017 19:18:01 +0000 (21:18 +0200)]
orangefs: Use kcalloc() in orangefs_prepare_cdm_array()
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data structure by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Markus Elfring [Thu, 17 Aug 2017 19:00:07 +0000 (21:00 +0200)]
orangefs: Delete error messages for a failed memory allocation in five functions
Omit an extra message for a memory allocation failure in these functions.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Julia Lawall [Wed, 2 Aug 2017 08:35:14 +0000 (10:35 +0200)]
orangefs: constify xattr_handler structure
The xattr_handler structure is only stored in an array of const
structures. Thus the xattr_handler structure itself can be
const.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Jeff Layton [Wed, 12 Apr 2017 12:06:02 +0000 (08:06 -0400)]
orangefs: don't call filemap_write_and_wait from fsync
Orangefs doesn't do buffered writes yet, so there's no point in
initiating and waiting for writeback.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Dan Carpenter [Mon, 22 May 2017 12:08:31 +0000 (15:08 +0300)]
orangefs: off by ones in xattr size checks
A previous patch which claimed to remove off by ones actually introduced
them.
strlen() returns the length of the string not including the NUL
character. We are using strcpy() to copy "name" into a buffer which is
ORANGEFS_MAX_XATTR_NAMELEN characters long. We should make sure to
leave space for the NUL, otherwise we're writing one character beyond
the end of the buffer.
Fixes:
e675c5ec51fe ("orangefs: clean up oversize xattr validation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Mike Marshall [Thu, 10 Aug 2017 17:56:45 +0000 (13:56 -0400)]
orangefs: documentation clean up
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Mike Marshall [Thu, 10 Aug 2017 17:50:50 +0000 (13:50 -0400)]
orangefs: react properly to posix_acl_update_mode's aftermath.
posix_acl_update_mode checks to see if the permissions
described by the ACL can be encoded into the
object's mode. If so, it sets "acl" to NULL
and "mode" to the new desired value. Prior to this patch
we failed to actually propagate the new mode back to the
server.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Jan Kara [Thu, 22 Jun 2017 13:31:13 +0000 (15:31 +0200)]
orangefs: Don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.
Fix the problem by creating __orangefs_set_acl() function that does not
call posix_acl_update_mode() and use it when inheriting ACLs. That
prevents SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.
Fixes:
073931017b49d9458aa351605b43a7e34598caef
CC: stable@vger.kernel.org
CC: Mike Marshall <hubcap@omnibond.com>
CC: pvfs2-developers@beowulf-underground.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Colin Ian King [Thu, 14 Sep 2017 16:01:25 +0000 (17:01 +0100)]
tg3: clean up redundant initialization of tnapi
tnapi is being initialized and then immediately updated and
hence the initialiation is redundant. Clean up the warning
by moving the declaration and initialization to the inside
of the for-loop.
Cleans up clang scan-build warning:
warning: Value stored to 'tnapi' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tim Chen [Fri, 25 Aug 2017 16:13:55 +0000 (09:13 -0700)]
sched/wait: Introduce wakeup boomark in wake_up_page_bit
Now that we have added breaks in the wait queue scan and allow bookmark
on scan position, we put this logic in the wake_up_page_bit function.
We can have very long page wait list in large system where multiple
pages share the same wait list. We break the wake up walk here to allow
other cpus a chance to access the list, and not to disable the interrupts
when traversing the list for too long. This reduces the interrupt and
rescheduling latency, and excessive page wait queue lock hold time.
[ v2: Remove bookmark_wake_function ]
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Chen [Fri, 25 Aug 2017 16:13:54 +0000 (09:13 -0700)]
sched/wait: Break up long wake list walk
We encountered workloads that have very long wake up list on large
systems. A waker takes a long time to traverse the entire wake list and
execute all the wake functions.
We saw page wait list that are up to 3700+ entries long in tests of
large 4 and 8 socket systems. It took 0.8 sec to traverse such list
during wake up. Any other CPU that contends for the list spin lock will
spin for a long time. It is a result of the numa balancing migration of
hot pages that are shared by many threads.
Multiple CPUs waking are queued up behind the lock, and the last one
queued has to wait until all CPUs did all the wakeups.
The page wait list is traversed with interrupt disabled, which caused
various problems. This was the original cause that triggered the NMI
watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only
extending the NMI watch dog timer there helped.
This patch bookmarks the waker's scan position in wake list and break
the wake up walk, to allow access to the list before the waker resume
its walk down the rest of the wait list. It lowers the interrupt and
rescheduling latency.
This patch also provides a performance boost when combined with the next
patch to break up page wakeup list walk. We saw 22% improvement in the
will-it-scale file pread2 test on a Xeon Phi system running 256 threads.
[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
simply access to flags. ]
Reported-by: Kan Liang <kan.liang@intel.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobias Klauser [Thu, 14 Sep 2017 11:22:25 +0000 (13:22 +0200)]
tls: make tls_sw_free_resources static
Make the needlessly global function tls_sw_free_resources static to fix
a gcc/sparse warning.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ladi Prosek [Thu, 14 Sep 2017 09:50:07 +0000 (11:50 +0200)]
KVM: trace events: update list of exit reasons
Adding entries for exit reasons 23 - 27:
KVM_EXIT_EPR
KVM_EXIT_SYSTEM_EVENT
KVM_EXIT_S390_STSI
KVM_EXIT_IOAPIC_EOI
KVM_EXIT_HYPERV
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Wanpeng Li [Thu, 14 Sep 2017 10:54:16 +0000 (03:54 -0700)]
KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2
qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info
ffffeb2c0e44e018 80000b0e
qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address
ffffeb2c0e44e018 error_code 0
qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2
After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.
This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Fixes:
7c90705bf2a3 ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pierre-Yves MORDRET [Thu, 14 Sep 2017 14:28:37 +0000 (16:28 +0200)]
i2c: i2c-stm32f7: add driver
This patch adds initial support for the STM32F7 I2C controller.
Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Pierre-Yves MORDRET [Thu, 14 Sep 2017 14:28:36 +0000 (16:28 +0200)]
i2c: i2c-stm32f4: use generic definition of speed enum
This patch uses a more generic definition of speed enum for i2c-stm32f4
driver.
Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Reviewed-by: Ludovic BARRE <ludovic.barre@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Pierre-Yves MORDRET [Thu, 14 Sep 2017 14:28:35 +0000 (16:28 +0200)]
dt-bindings: i2c-stm32: Document the STM32F7 I2C bindings
This patch adds the documentation of device tree bindings for STM32F7 I2C
Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>