David Ahern [Tue, 17 Jan 2017 22:57:36 +0000 (14:57 -0800)]
lwtunnel: fix autoload of lwt modules
Trying to add an mpls encap route when the MPLS modules are not loaded
hangs. For example:
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=m
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m
$ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
The ip command hangs:
root 880 826 0 21:25 pts/0 00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
$ cat /proc/880/stack
[<
ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
[<
ffffffff81065efc>] __request_module+0x27b/0x30a
[<
ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
[<
ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
[<
ffffffff814ae451>] fib_table_insert+0x90/0x41f
[<
ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
...
modprobe is trying to load rtnl-lwt-MPLS:
root 881 5 0 21:25 ? 00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
and it hangs after loading mpls_router:
$ cat /proc/881/stack
[<
ffffffff81441537>] rtnl_lock+0x12/0x14
[<
ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
[<
ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
[<
ffffffff81000471>] do_one_initcall+0x8e/0x13f
[<
ffffffff81119961>] do_init_module+0x5a/0x1e5
[<
ffffffff810bd070>] load_module+0x13bd/0x17d6
...
The problem is that lwtunnel_build_state is called with rtnl lock
held preventing mpls_init from registering.
Given the potential references held by the time lwtunnel_build_state it
can not drop the rtnl lock to the load module. So, extract the module
loading code from lwtunnel_build_state into a new function to validate
the encap type. The new function is called while converting the user
request into a fib_config which is well before any table, device or
fib entries are examined.
Fixes:
745041e2aaf1 ("lwtunnel: autoload of lwt modules")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 18 Jan 2017 03:07:19 +0000 (22:07 -0500)]
bnxt_en: Fix "uninitialized variable" bug in TPA code path.
In the TPA GRO code path, initialize the tcp_opt_len variable to 0 so
that it will be correct for packets without TCP timestamps. The bug
caused the SKB fields to be incorrectly set up for packets without
TCP timestamps, leading to these packets being rejected by the stack.
Reported-by: Andy Gospodarek <andrew.gospodarek@broadocm.com>
Acked-by: Andy Gospodarek <andrew.gospodarek@broadocm.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Gonzalez Cabanelas [Wed, 18 Jan 2017 00:26:55 +0000 (16:26 -0800)]
net: phy: bcm63xx: Utilize correct config_intr function
Commit
a1cba5613edf ("net: phy: Add Broadcom phy library for common
interfaces") make the BCM63xx PHY driver utilize bcm_phy_config_intr()
which would appear to do the right thing, except that it does not write
to the MII_BCM63XX_IR register but to MII_BCM54XX_ECR which is
different.
This would be causing invalid link parameters and events from being
generated by the PHY interrupt.
Fixes:
a1cba5613edf ("net: phy: Add Broadcom phy library for common interfaces")
Signed-off-by: Daniel Gonzalez Cabanelas <dgcbueu@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Jan 2017 20:12:17 +0000 (12:12 -0800)]
net: fix harmonize_features() vs NETIF_F_HIGHDMA
Ashizuka reported a highmem oddity and sent a patch for freescale
fec driver.
But the problem root cause is that core networking stack
must ensure no skb with highmem fragment is ever sent through
a device that does not assert NETIF_F_HIGHDMA in its features.
We need to call illegal_highdma() from harmonize_features()
regardless of CSUM checks.
Fixes:
ec5f06156423 ("net: Kill link between CSUM and SG features.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin Shelar <pshelar@ovn.org>
Reported-by: "Ashizuka, Yuusuke" <ashiduka@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Jan 2017 20:11:21 +0000 (15:11 -0500)]
Merge branch 'xen-netback-leaks'
Igor Druzhinin says:
====================
xen-netback: fix memory leaks on XenBus disconnect
Just split the initial patch in two as proposed by Wei.
Since the approach for locking netdev statistics is inconsistent (tends not
to have any locking at all) accross the kernel we'd better to rely on our
internal lock for this purpose.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Druzhinin [Tue, 17 Jan 2017 20:49:38 +0000 (20:49 +0000)]
xen-netback: protect resource cleaning on XenBus disconnect
vif->lock is used to protect statistics gathering agents from using the
queue structure during cleaning.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Druzhinin [Tue, 17 Jan 2017 20:49:37 +0000 (20:49 +0000)]
xen-netback: fix memory leaks on XenBus disconnect
Eliminate memory leaks introduced several years ago by cleaning the
queue resources which are allocated on XenBus connection event. Namely, queue
structure array and pages used for IO rings.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Jan 2017 19:58:24 +0000 (14:58 -0500)]
Merge branch 'ethtool-set-channels-fix'
Tariq Toukan says:
====================
ethtool fix
This patchset from Eran contains a fix to ethtool set_channels, where the call
to get_channels with an uninitialized parameter might result in garbage fields.
It also contains two followup changes in our mlx4/mlx5 Eth drivers.
Series generated against net commit:
0faa9cb5b383 net sched actions: fix refcnt when GETing of action after bind
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Tue, 17 Jan 2017 17:19:19 +0000 (19:19 +0200)]
net/mlx5e: Remove unnecessary checks when setting num channels
Boundaries checks for the number of RX and TX should be checked by the
caller and not in the driver.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Tue, 17 Jan 2017 17:19:18 +0000 (19:19 +0200)]
net/mlx4_en: Remove unnecessary checks when setting num channels
Boundaries checks for the number of RX, TX, other and combined channels
should be checked by the caller and not in the driver.
In addition, remove wrong memset on get channels as it overrides the cmd
field in the requester struct.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Tue, 17 Jan 2017 17:19:17 +0000 (19:19 +0200)]
net: ethtool: Initialize buffer when querying device channel settings
Ethtool channels respond struct was uninitialized when querying device
channel boundaries settings. As a result, unreported fields by the driver
hold garbage. This may cause sending unsupported params to driver.
Fixes:
8bf368620486 ('ethtool: ensure channel counts are within bounds ...')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
CC: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Jan 2017 16:36:41 +0000 (11:36 -0500)]
Merge tag 'linux-can-fixes-for-4.10-
20170118' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-01-18
this is a pull request for net/master consisting of two patches.
In the first patch Einar Jón fixes a NULL-pointer-deref in the c_can_pci
driver. In the second patch Yegor Yefremov fixes the clock handling in the
ti_hecc driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yegor Yefremov [Wed, 18 Jan 2017 10:35:57 +0000 (11:35 +0100)]
can: ti_hecc: add missing prepare and unprepare of the clock
In order to make the driver work with the common clock framework, this
patch converts the clk_enable()/clk_disable() to
clk_prepare_enable()/clk_disable_unprepare().
Also add error checking for clk_prepare_enable().
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Einar Jón [Fri, 12 Aug 2016 11:50:41 +0000 (13:50 +0200)]
can: c_can_pci: fix null-pointer-deref in c_can_start() - set device pointer
The priv->device pointer for c_can_pci is never set, but it is used
without a NULL check in c_can_start(). Setting it in c_can_pci_probe()
like c_can_plat_probe() prevents c_can_pci.ko from crashing, with and
without CONFIG_PM.
This might also cause the pm_runtime_*() functions in c_can.c to
actually be executed for c_can_pci devices - they are the only other
place where priv->device is used, but they all contain a null check.
Signed-off-by: Einar Jón <tolvupostur@gmail.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Alvaro G. M [Tue, 17 Jan 2017 08:08:16 +0000 (09:08 +0100)]
net: phy: dp83848: add DP83620 PHY support
This PHY with fiber support is register compatible with DP83848,
so add support for it.
Signed-off-by: Alvaro Gamez Machado <alvaro.gamez@hazent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Tue, 17 Jan 2017 06:17:29 +0000 (22:17 -0800)]
bpf: Fix test_lru_sanity5() in test_lru_map.c
test_lru_sanity5() fails when the number of online cpus
is fewer than the number of possible cpus. It can be
reproduced with qemu by using cmd args "--smp cpus=2,maxcpus=8".
The problem is the loop in test_lru_sanity5() is testing
'i' which is incorrect.
This patch:
1. Make sched_next_online() always return -1 if it cannot
find a next cpu to schedule the process.
2. In test_lru_sanity5(), the parent process does
sched_setaffinity() first (through sched_next_online())
and the forked process will inherit it according to
the 'man sched_setaffinity'.
Fixes:
5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lance Richardson [Mon, 16 Jan 2017 23:37:58 +0000 (18:37 -0500)]
vxlan: fix byte order of vxlan-gpe port number
vxlan->cfg.dst_port is in network byte order, so an htons()
is needed here. Also reduced comment length to stay closer
to 80 column width (still slightly over, however).
Fixes:
e1e5314de08b ("vxlan: implement GPE")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Tue, 17 Jan 2017 11:23:21 +0000 (12:23 +0100)]
stmmac: add missing of_node_put
The function stmmac_dt_phy provides several possibilities for initializing
plat->mdio_node, all of which have the effect of increasing the reference
count of the assigned value. This field is not updated elsewhere, so the
value is live until the end of the lifetime of plat (devm_allocated), just
after the end of stmmac_remove_config_dt. Thus, add an of_node_put on
plat->mdio_node in stmmac_remove_config_dt. It is possible that the field
mdio_node is never initialized, but of_node_put is NULL-safe, so it is also
safe to call of_node_put in that case.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rolf Neugebauer [Tue, 17 Jan 2017 18:13:51 +0000 (18:13 +0000)]
virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit
This patch part reverts
fd2a0437dc33 and
e858fae2b0b8 which introduced a
subtle change in how the virtio_net flags are derived from the SKBs
ip_summed field.
With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID
when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to
ip_summed == CHECKSUM_NONE, which should be the same.
Further, the virtio spec 1.0 / CS04 explicitly says that
VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver.
Fixes:
fd2a0437dc33 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Fixes:
e858fae2b0b8 (" virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 17 Jan 2017 17:33:10 +0000 (09:33 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Handle multicast packets properly in fast-RX path of mac80211, from
Johannes Berg.
2) Because of a logic bug, the user can't actually force SW
checksumming on r8152 devices. This makes diagnosis of hw
checksumming bugs really annoying. Fix from Hayes Wang.
3) VXLAN route lookup does not take the source and destination ports
into account, which means IPSEC policies cannot be matched properly.
Fix from Martynas Pumputis.
4) Do proper RCU locking in netvsc callbacks, from Stephen Hemminger.
5) Fix SKB leaks in mlxsw driver, from Arkadi Sharshevsky.
6) If lwtunnel_fill_encap() fails, we do not abort the netlink message
construction properly in fib_dump_info(), from David Ahern.
7) Do not use kernel stack for DMA buffers in atusb driver, from Stefan
Schmidt.
8) Openvswitch conntack actions need to maintain a correct checksum,
fix from Lance Richardson.
9) ax25_disconnect() is missing a check for ax25->sk being NULL, in
fact it already checks this, but not in all of the necessary spots.
Fix from Basil Gunn.
10) Action GET operations in the packet scheduler can erroneously bump
the reference count of the entry, making it unreleasable. Fix from
Jamal Hadi Salim. Jamal gives a great set of example command lines
that trigger this in the commit message.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
net sched actions: fix refcnt when GETing of action after bind
net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
net/mlx4_core: Fix racy CQ (Completion Queue) free
net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered
net/mlx5e: Fix a -Wmaybe-uninitialized warning
ax25: Fix segfault after sock connection timeout
bpf: rework prog_digest into prog_tag
tipc: allocate user memory with GFP_KERNEL flag
net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types
ip6_tunnel: Account for tunnel header in tunnel MTU
mld: do not remove mld souce list info when set link down
be2net: fix MAC addr setting on privileged BE3 VFs
be2net: don't delete MAC on close on unprivileged BE3 VFs
be2net: fix status check in be_cmd_pmac_add()
cpmac: remove hopeless #warning
ravb: do not use zero-length alignment DMA descriptor
mlx4: do not call napi_schedule() without care
openvswitch: maintain correct checksum state in conntrack actions
tcp: fix tcp_fastopen unaligned access complaints on sparc
...
Linus Torvalds [Tue, 17 Jan 2017 17:27:50 +0000 (09:27 -0800)]
Merge branch 'stable/for-linus-4.10' of git://git./linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
"A tiny fix to make sure that page-sized mappings are page-aligned (and
not say straddle two pages). This is important for some drivers (such
as NVME)"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: ensure that page-sized mappings are page-aligned
Linus Torvalds [Tue, 17 Jan 2017 17:08:19 +0000 (09:08 -0800)]
Merge tag 'mmc-v4.10-rc3' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- fix regressions detecting HS/HS DDR eMMC cards related to CMD6
MMC host:
- mmc: mxs-mmc: Fix additional cycles after transmission stop
- sdhci-acpi: Only powered up enabled acpi child devices
- meson: avoid possible NULL dereference"
* tag 'mmc-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: Restore parts of the polling policy when switch to HS/HS DDR
mmc: mxs-mmc: Fix additional cycles after transmission stop
mmc: sdhci-acpi: Only powered up enabled acpi child devices
MMC: meson: avoid possible NULL dereference
Linus Torvalds [Tue, 17 Jan 2017 16:50:59 +0000 (08:50 -0800)]
Merge tag 'for-linus-
20170116' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"Just NAND updates from Boris:
- avoid compiling xway NAND controller driver as a module (which
didn't work)
- fix tango NAND DT binding and make sure the controller is in a
clean state at probe time
- add dependency on HAS_IOMEM to the oxnas NAND driver
- fix irq number validity check in the lpc32xx driver"
* tag 'for-linus-
20170116' of git://git.infradead.org/linux-mtd:
mtd: nand: lpc32xx: fix invalid error handling of a requested irq
mtd: nand: tango: Reset pbus to raw mode in probe
mtd: nand: tango: Update DT binding description
mtd: nand: oxnas_nand: fix build errors on arch/um, require HAS_IOMEM
mtd: nand: xway: fix build because of module functions
mtd: nand: xway: disable module support
Jamal Hadi Salim [Sun, 15 Jan 2017 15:14:06 +0000 (10:14 -0500)]
net sched actions: fix refcnt when GETing of action after bind
Demonstrating the issue:
.. add a drop action
$sudo $TC actions add action drop index 10
.. retrieve it
$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 2 bind 0 installed 29 sec used 29 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
... bug 1 above: reference is two.
Reference is actually 1 but we forget to subtract 1.
... do a GET again and we see the same issue
try a few times and nothing changes
~$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 2 bind 0 installed 31 sec used 31 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
... lets try to bind the action to a filter..
$ sudo $TC qdisc add dev lo ingress
$ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
... and now a few GETs:
$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 3 bind 1 installed 204 sec used 204 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 4 bind 1 installed 206 sec used 206 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 5 bind 1 installed 235 sec used 235 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
.... as can be observed the reference count keeps going up.
After the fix
$ sudo $TC actions add action drop index 10
$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 1 bind 0 installed 4 sec used 4 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 1 bind 0 installed 6 sec used 6 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
$ sudo $TC qdisc add dev lo ingress
$ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 2 bind 1 installed 32 sec used 32 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
$ sudo $TC -s actions get action gact index 10
action order 1: gact action drop
random type none pass val 0
index 10 ref 2 bind 1 installed 33 sec used 33 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Fixes:
aecc5cefc389 ("net sched actions: fix GETing actions")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 16 Jan 2017 20:15:59 +0000 (12:15 -0800)]
Merge tag 'nfs-for-4.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- fix invalid fget()/fput() calls when doing file locking
- fix multiple directory cache invalidation issues due to the client
failing to recognise that the directory wasn't changed
- fix client recovery when server reboots multiple times
* tag 'nfs-for-4.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix client recovery when server reboots multiple times
NFSv4: update_changeattr should update the attribute timestamp
NFSv4: Don't call update_changeattr() unless the unlink is successful
NFSv4: Don't apply change_info4 twice on rename within a directory
NFSv4: Call update_changeattr() from _nfs4_proc_open only if a file was created
nfs: Don't take a reference on fl->fl_file for LOCK operation
David S. Miller [Mon, 16 Jan 2017 20:08:29 +0000 (15:08 -0500)]
Merge branch 'mlx4-core-fixes'
Tariq Toukan says:
====================
mlx4 core fixes
This patchset contains bug fixes from Jack to the mlx4 Core driver.
Patch 1 solves a race in the flow of CQ free.
Patch 2 moves some qp context flags update to the correct qp transition.
Patch 3 eliminates warnings from the path of SRQ_LIMIT that flood the message log,
and keeps them only in the path of SRQ_CATAS_ERROR.
Series generated against net commit:
1a717fcf8bbe Merge tag 'mac80211-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Mon, 16 Jan 2017 16:31:39 +0000 (18:31 +0200)]
net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
When running SRIOV, warnings for SRQ LIMIT events flood the Hypervisor's
message log when (correct, normally operating) apps use SRQ LIMIT events
as a trigger to post WQEs to SRQs.
Add more information to the existing debug printout for SRQ_LIMIT, and
output the warning messages only for the SRQ CATAS ERROR event.
Fixes:
acba2420f9d2 ("mlx4_core: Add wrapper functions and comm channel and slave event support to EQs")
Fixes:
e0debf9cb50d ("mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Mon, 16 Jan 2017 16:31:38 +0000 (18:31 +0200)]
net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
Save the qp context flags byte containing the flag disabling vlan stripping
in the RESET to INIT qp transition, rather than in the INIT to RTR
transition. Per the firmware spec, the flags in this byte are active
in the RESET to INIT transition.
As a result of saving the flags in the incorrect qp transition, when
switching dynamically from VGT to VST and back to VGT, the vlan
remained stripped (as is required for VST) and did not return to
not-stripped (as is required for VGT).
Fixes:
f0f829bf42cd ("net/mlx4_core: Add immediate activate for VGT->VST->VGT")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Mon, 16 Jan 2017 16:31:37 +0000 (18:31 +0200)]
net/mlx4_core: Fix racy CQ (Completion Queue) free
In function mlx4_cq_completion() and mlx4_cq_event(), the
radix_tree_lookup requires a rcu_read_lock.
This is mandatory: if another core frees the CQ, it could
run the radix_tree_node_rcu_free() call_rcu() callback while
its being used by the radix tree lookup function.
Additionally, in function mlx4_cq_event(), since we are adding
the rcu lock around the radix-tree lookup, we no longer need to take
the spinlock. Also, the synchronize_irq() call for the async event
eliminates the need for incrementing the cq reference count in
mlx4_cq_event().
Other changes:
1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock:
we no longer take this spinlock in the interrupt context.
The spinlock here, therefore, simply protects against different
threads simultaneously invoking mlx4_cq_free() for different cq's.
2. In function mlx4_cq_free(), we move the radix tree delete to before
the synchronize_irq() calls. This guarantees that we will not
access this cq during any subsequent interrupts, and therefore can
safely free the CQ after the synchronize_irq calls. The rcu_read_lock
in the interrupt handlers only needs to protect against corrupting the
radix tree; the interrupt handlers may access the cq outside the
rcu_read_lock due to the synchronize_irq calls which protect against
premature freeing of the cq.
3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg.
4. We leave the cq reference count mechanism in place, because it is
still needed for the cq completion tasklet mechanism.
Fixes:
6d90aa5cf17b ("net/mlx4_core: Make sure there are no pending async events when freeing CQ")
Fixes:
225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sun, 15 Jan 2017 18:19:00 +0000 (19:19 +0100)]
net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered
Don't use netdev_info and friends before the net_device is registered.
This avoids ugly messages like
"meson8b-dwmac
c9410000.ethernet (unnamed net_device) (uninitialized):
Enable RX Mitigation via HW Watchdog Timer"
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sun, 15 Jan 2017 17:50:46 +0000 (19:50 +0200)]
net/mlx5e: Fix a -Wmaybe-uninitialized warning
As found by Olof's build bot, we gain a harmless warning about a
potential uninitialized variable reference in mlx5:
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'parse_tc_fdb_actions':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:769:13: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:811:21: note: 'out_dev' was declared here
This was introduced through the addition of an 'IS_ERR/PTR_ERR' pair
that gcc is unfortunately unable to completely figure out.
The problem being gcc cannot tell that if(IS_ERR()) in
mlx5e_route_lookup_ipv4() is equivalent to checking if(err) later,
so it assumes that 'out_dev' is used after the 'return PTR_ERR(rt)'.
The PTR_ERR_OR_ZERO() case by comparison is fairly easy to detect
by gcc, so it can't get that wrong, so it no longer warns.
Hadar Hen Zion already attempted to fix the warning earlier by adding fake
initializations, but that ended up not fully addressing all warnings, so
I'm reverting it now that it is no longer needed.
Link: http://arm-soc.lixom.net/buildlogs/mainline/v4.10-rc3-98-gcff3b2c/
Fixes:
a42485eb0ee4 ("net/mlx5e: TC ipv4 tunnel encap offload error flow fixes")
Fixes:
a757d108dc1a ("net/mlx5e: Fix kbuild warnings for uninitialized parameters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basil Gunn [Sat, 14 Jan 2017 20:18:55 +0000 (12:18 -0800)]
ax25: Fix segfault after sock connection timeout
The ax.25 socket connection timed out & the sock struct has been
previously taken down ie. sock struct is now a NULL pointer. Checking
the sock_flag causes the segfault. Check if the socket struct pointer
is NULL before checking sock_flag. This segfault is seen in
timed out netrom connections.
Please submit to -stable.
Signed-off-by: Basil Gunn <basil@pacabunga.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 13 Jan 2017 22:38:15 +0000 (23:38 +0100)]
bpf: rework prog_digest into prog_tag
Commit
7bd509e311f4 ("bpf: add prog_digest and expose it via
fdinfo/netlink") was recently discussed, partially due to
admittedly suboptimal name of "prog_digest" in combination
with sha1 hash usage, thus inevitably and rightfully concerns
about its security in terms of collision resistance were
raised with regards to use-cases.
The intended use cases are for debugging resp. introspection
only for providing a stable "tag" over the instruction sequence
that both kernel and user space can calculate independently.
It's not usable at all for making a security relevant decision.
So collisions where two different instruction sequences generate
the same tag can happen, but ideally at a rather low rate. The
"tag" will be dumped in hex and is short enough to introspect
in tracepoints or kallsyms output along with other data such
as stack trace, etc. Thus, this patch performs a rename into
prog_tag and truncates the tag to a short output (64 bits) to
make it obvious it's not collision-free.
Should in future a hash or facility be needed with a security
relevant focus, then we can think about requirements, constraints,
etc that would fit to that situation. For now, rework the exposed
parts for the current use cases as long as nothing has been
released yet. Tested on x86_64 and s390x.
Fixes:
7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parthasarathy Bhuvaragan [Fri, 13 Jan 2017 14:46:25 +0000 (15:46 +0100)]
tipc: allocate user memory with GFP_KERNEL flag
Until now, we allocate memory always with GFP_ATOMIC flag.
When the system is under memory pressure and a user tries to send,
the send fails due to low memory. However, the user application
can wait for free memory if we allocate it using GFP_KERNEL flag.
In this commit, we use allocate memory with GFP_KERNEL for all user
allocation.
Reported-by: Rune Torgersen <runet@innovsys.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karicheri, Muralidharan [Fri, 13 Jan 2017 14:32:34 +0000 (09:32 -0500)]
net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types
Currently dp83867 driver returns error if phy interface type
PHY_INTERFACE_MODE_RGMII_RXID is used to set the rx only internal
delay. Similarly issue happens for PHY_INTERFACE_MODE_RGMII_TXID.
Fix this by checking also the interface type if a particular delay
value is missing in the phy dt bindings. Also update the DT document
accordingly.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Sitnicki [Fri, 13 Jan 2017 09:12:20 +0000 (10:12 +0100)]
ip6_tunnel: Account for tunnel header in tunnel MTU
With ip6gre we have a tunnel header which also makes the tunnel MTU
smaller. We need to reserve room for it. Previously we were using up
space reserved for the Tunnel Encapsulation Limit option
header (RFC 2473).
Also, after commit
b05229f44228 ("gre6: Cleanup GREv6 transmit path,
call common GRE functions") our contract with the caller has
changed. Now we check if the packet length exceeds the tunnel MTU after
the tunnel header has been pushed, unlike before.
This is reflected in the check where we look at the packet length minus
the size of the tunnel header, which is already accounted for in tunnel
MTU.
Fixes:
b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Thu, 12 Jan 2017 13:19:37 +0000 (21:19 +0800)]
mld: do not remove mld souce list info when set link down
This is an IPv6 version of commit
24803f38a5c0 ("igmp: do not remove igmp
souce list..."). In mld_del_delrec(), we will restore back all source filter
info instead of flush them.
Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
we should not remove source list info when set link down. Remove
igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
ipv6_mc_down().
Also clear all source info after igmp6_group_dropped() instead of in it
because ipv6_mc_down() will call igmp6_group_dropped().
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 16 Jan 2017 17:34:37 +0000 (09:34 -0800)]
Merge tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"Miscellaneous nfsd bugfixes, one for a 4.10 regression, three for
older bugs"
* tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux:
svcrdma: avoid duplicate dma unmapping during error recovery
sunrpc: don't call sleeping functions from the notifier block callbacks
svcrpc: don't leak contexts on PROC_DESTROY
nfsd: fix supported attributes for acl & labels
Ivan Vecera [Fri, 13 Jan 2017 21:38:29 +0000 (22:38 +0100)]
be2net: fix MAC addr setting on privileged BE3 VFs
During interface opening MAC address stored in netdev->dev_addr is
programmed in the HW with exception of BE3 VFs where the initial
MAC is programmed by parent PF. This is OK when MAC address is not
changed when an interfaces is down. In this case the requested MAC is
stored to netdev->dev_addr and later is stored into HW during opening.
But this is not done for all BE3 VFs so the NIC HW does not know
anything about this change and all traffic is filtered.
This is the case of bonding if fail_over_mac == 0 where the MACs of
the slaves are changed while they are down.
The be2net behavior is too restrictive because if a BE3 VF has
the FILTMGMT privilege then it is able to modify its MAC without
any restriction.
To solve the described problem the driver should take care about these
privileged BE3 VFs so the MAC is programmed during opening. And by
contrast unpriviled BE3 VFs should not be allowed to change its MAC
in any case.
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Fri, 13 Jan 2017 21:38:28 +0000 (22:38 +0100)]
be2net: don't delete MAC on close on unprivileged BE3 VFs
BE3 VFs without FILTMGMT privilege are not allowed to modify its MAC,
VLAN table and UC/MC lists. So don't try to delete MAC on such VFs.
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Fri, 13 Jan 2017 21:38:27 +0000 (22:38 +0100)]
be2net: fix status check in be_cmd_pmac_add()
Return value from be_mcc_notify_wait() contains a base completion status
together with an additional status. The base_status() macro need to be
used to access base status.
Fixes: e3a7ae2 be2net: Changing MAC Address of a VF was broken
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 16 Jan 2017 13:20:54 +0000 (14:20 +0100)]
cpmac: remove hopeless #warning
The #warning was present 10 years ago when the driver first got merged.
As the platform is rather obsolete by now, it seems very unlikely that
the warning will cause anyone to fix the code properly.
kernelci.org reports the warning for every build in the meantime, so
I think it's better to just turn it into a code comment to reduce
noise.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masaru Nagai [Mon, 16 Jan 2017 10:45:21 +0000 (11:45 +0100)]
Eric Dumazet [Fri, 13 Jan 2017 16:39:24 +0000 (08:39 -0800)]
mlx4: do not call napi_schedule() without care
Disable BH around the call to napi_schedule() to avoid following warning
[ 52.095499] NOHZ: local_softirq_pending 08
[ 52.421291] NOHZ: local_softirq_pending 08
[ 52.608313] NOHZ: local_softirq_pending 08
Fixes:
8d59de8f7bb3 ("net/mlx4_en: Process all completions in RX rings after port goes up")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 Jan 2017 16:36:42 +0000 (11:36 -0500)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2017-01-16
Here are a couple of important 802.15.4 driver fixes for the 4.10
kernel.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ulf Hansson [Fri, 13 Jan 2017 11:05:03 +0000 (12:05 +0100)]
mmc: core: Restore parts of the polling policy when switch to HS/HS DDR
Regressions for not being able to detect an eMMC HS DDR mode card has been
reported for the sdhci-esdhc-imx driver, but potentially other sdhci
variants may suffer from the similar problem.
The commit
e173f8911f09 ("mmc: core: Update CMD13 polling policy when
switch to HS DDR mode"), is causing the problem. It seems that change moved
one step to far, regarding changing the host's timing before polling for a
busy card.
To fix this, let's move back to the behaviour when the host's timing is
updated after the polling, but before the switch status is fetched and
validated.
In cases when polling with CMD13, we keep validating the switch status at
each attempt. However, to align with the other card busy detections
mechanism, let's fetch and validate the switch status also after the host's
timing is updated.
Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Reported-by: Gary Bisson <gary.bisson@boundarydevices.com>
Fixes:
e173f8911f09 ("mmc: core: Update CMD13 polling policy when switch..")
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Cc: Dong Aisheng <aisheng.dong@nxp.com>
Cc: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Tested-by: Jagan Teki <jagan@amarulasolutions.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Tested-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
David S. Miller [Mon, 16 Jan 2017 03:17:59 +0000 (22:17 -0500)]
Merge tag 'mac80211-for-davem-2017-01-13' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
We have a number of fixes, in part because I was late
in actually sending them out - will try to do better in
the future:
* handle VHT opmode properly when hostapd is controlling
full station state
* two fixes for minimum channel width in mac80211
* don't leave SMPS set to junk in HT capabilities
* fix headroom when forwarding mesh packets, recently
broken by another fix that failed to take into account
frame encryption
* fix the TID in null-data packets indicating EOSP (end
of service period) in U-APSD
* prevent attempting to use (and then failing which
results in crashes) TXQs on stations that aren't added
to the driver yet
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lance Richardson [Fri, 13 Jan 2017 00:33:18 +0000 (19:33 -0500)]
openvswitch: maintain correct checksum state in conntrack actions
When executing conntrack actions on skbuffs with checksum mode
CHECKSUM_COMPLETE, the checksum must be updated to account for
header pushes and pulls. Otherwise we get "hw csum failure"
logs similar to this (ICMP packet received on geneve tunnel
via ixgbe NIC):
[ 405.740065] genev_sys_6081: hw csum failure
[ 405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G I 4.10.0-rc3+ #1
[ 405.740108] Call Trace:
[ 405.740110] <IRQ>
[ 405.740113] dump_stack+0x63/0x87
[ 405.740116] netdev_rx_csum_fault+0x3a/0x40
[ 405.740118] __skb_checksum_complete+0xcf/0xe0
[ 405.740120] nf_ip_checksum+0xc8/0xf0
[ 405.740124] icmp_error+0x1de/0x351 [nf_conntrack_ipv4]
[ 405.740132] nf_conntrack_in+0xe1/0x550 [nf_conntrack]
[ 405.740137] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
[ 405.740143] __ovs_ct_lookup+0x95/0x980 [openvswitch]
[ 405.740145] ? netif_rx_internal+0x44/0x110
[ 405.740149] ovs_ct_execute+0x147/0x4b0 [openvswitch]
[ 405.740153] do_execute_actions+0x22e/0xa70 [openvswitch]
[ 405.740157] ovs_execute_actions+0x40/0x120 [openvswitch]
[ 405.740161] ovs_dp_process_packet+0x84/0x120 [openvswitch]
[ 405.740166] ovs_vport_receive+0x73/0xd0 [openvswitch]
[ 405.740168] ? udp_rcv+0x1a/0x20
[ 405.740170] ? ip_local_deliver_finish+0x93/0x1e0
[ 405.740172] ? ip_local_deliver+0x6f/0xe0
[ 405.740174] ? ip_rcv_finish+0x3a0/0x3a0
[ 405.740176] ? ip_rcv_finish+0xdb/0x3a0
[ 405.740177] ? ip_rcv+0x2a7/0x400
[ 405.740180] ? __netif_receive_skb_core+0x970/0xa00
[ 405.740185] netdev_frame_hook+0xd3/0x160 [openvswitch]
[ 405.740187] __netif_receive_skb_core+0x1dc/0xa00
[ 405.740194] ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe]
[ 405.740197] __netif_receive_skb+0x18/0x60
[ 405.740199] netif_receive_skb_internal+0x40/0xb0
[ 405.740201] napi_gro_receive+0xcd/0x120
[ 405.740204] gro_cell_poll+0x57/0x80 [geneve]
[ 405.740206] net_rx_action+0x260/0x3c0
[ 405.740209] __do_softirq+0xc9/0x28c
[ 405.740211] irq_exit+0xd9/0xf0
[ 405.740213] do_IRQ+0x51/0xd0
[ 405.740215] common_interrupt+0x93/0x93
Fixes:
7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 16 Jan 2017 00:21:59 +0000 (16:21 -0800)]
Linux 4.10-rc4
Linus Torvalds [Mon, 16 Jan 2017 00:09:50 +0000 (16:09 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
"This tree contains 4 fixes.
The first is a fix for a race that can causes oopses under the right
circumstances, and that someone just recently encountered.
Past that are several small trivial correct fixes. A real issue that
was blocking development of an out of tree driver, but does not appear
to have caused any actual problems for in-tree code. A potential
deadlock that was reported by lockdep. And a deadlock people have
experienced and took the time to track down caused by a cleanup that
removed the code to drop a reference count"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sysctl: Drop reference added by grab_header in proc_sys_readdir
pid: fix lockdep deadlock warning due to ucount_lock
libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount
mnt: Protect the mountpoint hashtable with mount_lock
Linus Torvalds [Sun, 15 Jan 2017 20:40:53 +0000 (12:40 -0800)]
Merge tag 'char-misc-4.10-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 4.10-rc4 that resolve
some reported issues.
The MEI driver issue resolves a lot of problems that people have been
having, as does the mem driver fix. The other minor fixes resolve
other reported issues.
All of these have been in linux-next for a while"
* tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
vme: Fix wrong pointer utilization in ca91cx42_slave_get
auxdisplay: fix new ht16k33 build errors
ppdev: don't print a free'd string
extcon: return error code on failure
drivers: char: mem: Fix thinkos in kmem address checks
mei: bus: enable OS version only for SPT and newer
Linus Torvalds [Sun, 15 Jan 2017 20:38:53 +0000 (12:38 -0800)]
Merge tag 'driver-core-4.10-rc4' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single patch being reverted to remove a feature that was
added in 4.10-rc1 that isn't quite ready for release.
It will be redone as a debugfs file instead of a sysfs file in the
future"
* tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "driver core: Add deferred_probe attribute to devices in sysfs"
Linus Torvalds [Sun, 15 Jan 2017 20:36:32 +0000 (12:36 -0800)]
Merge tag 'tty-4.10-rc4' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial driver fixes for 4.10-rc4 to resolve a
number of reported issues.
Nothing major here at all, one revert of a problematic patch, and some
other tiny bugfixes. Full details are in the shortlog below.
All have been in linux-next with no reported issues"
* tag 'tty-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
sysrq: attach sysrq handler correctly for 32-bit kernel
Revert "tty: serial: 8250: add CON_CONSDEV to flags"
Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break
8250_pci: Fix potential use-after-free in error path
tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx
Linus Torvalds [Sun, 15 Jan 2017 20:34:35 +0000 (12:34 -0800)]
Merge tag 'usb-4.10-rc4' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few small USB driver fixes for 4.10-rc4 to resolve some
reported issues.
The "largest" here is a number of bugs being fixed in the ch341
usb-serial driver, to hopefully resolve the mess of different devices
floating around that use this driver that have been having problems
with the 4.10-rc1 release.
There's also a tiny musb fix that I missed in the last pull request,
as well as the traditional xhci fix rounding out the batch.
All have been in linux-next with no reported issues"
* tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: fix deadlock at host remove by running watchdog correctly
USB: serial: ch341: fix control-message error handling
usb: musb: fix runtime PM in debugfs
wusbcore: Fix one more crypto-on-the-stack bug
USB: serial: kl5kusb105: fix line-state error handling
USB: serial: ch341: fix baud rate and line-control handling
USB: serial: ch341: fix line settings after reset-resume
USB: serial: ch341: fix resume after reset
USB: serial: ch341: fix open error handling
USB: serial: ch341: fix modem-control and B0 handling
USB: serial: ch341: fix open and resume after B0
USB: serial: ch341: fix initial modem-control state
Linus Torvalds [Sun, 15 Jan 2017 20:28:14 +0000 (12:28 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Bugfixes for I2C. Mostly core this time which is a bit unusual but
nothing really scary in there"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: piix4: Avoid race conditions with IMC
i2c: fix spelling mistake: "insufficent" -> "insufficient"
i2c: print correct device invalid address
i2c: do not enable fall back to Host Notify by default
i2c: fix kernel memory disclosure in dev interface
Linus Torvalds [Sun, 15 Jan 2017 20:03:11 +0000 (12:03 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- unwinder fixes
- AMD CPU topology enumeration fixes
- microcode loader fixes
- x86 embedded platform fixes
- fix for a bootup crash that may trigger when clearcpuid= is used
with invalid values"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mpx: Use compatible types in comparison to fix sparse error
x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
x86/entry: Fix the end of the stack for newly forked tasks
x86/unwind: Include __schedule() in stack traces
x86/unwind: Disable KASAN checks for non-current tasks
x86/unwind: Silence warnings for non-current tasks
x86/microcode/intel: Use correct buffer size for saving microcode data
x86/microcode/intel: Fix allocation size of struct ucode_patch
x86/microcode/intel: Add a helper which gives the microcode revision
x86/microcode: Use native CPUID to tickle out microcode revision
x86/CPU: Add native CPUID variants returning a single datum
x86/boot: Add missing declaration of string functions
x86/CPU/AMD: Fix Bulldozer topology
x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
x86/cpu: Fix typo in the comment for Anniedale
x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
Linus Torvalds [Sun, 15 Jan 2017 20:00:37 +0000 (12:00 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull NOHZ fix from Ingo Molnar:
"This fixes an old NOHZ race where we incorrectly calculate the next
timer interrupt in certain circumstances where hrtimers are pending,
that can cause hard to reproduce stalled-values artifacts in
/proc/stat"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: Fix collision between tick and other hrtimers
Linus Torvalds [Sun, 15 Jan 2017 19:37:43 +0000 (11:37 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
driver fixes, plus miscellanous tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Reject non sampling events with precise_ip
perf/x86/intel: Account interrupts for PEBS errors
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
perf/core: Fix sys_perf_event_open() vs. hotplug
perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
perf/x86: Set pmu->module in Intel PMU modules
perf probe: Fix to probe on gcc generated symbols for offline kernel
perf probe: Fix --funcs to show correct symbols for offline module
perf symbols: Robustify reading of build-id from sysfs
perf tools: Install tools/lib/traceevent plugins with install-bin
tools lib traceevent: Fix prev/next_prio for deadline tasks
perf record: Fix --switch-output documentation and comment
perf record: Make __record_options static
tools lib subcmd: Add OPT_STRING_OPTARG_SET option
perf probe: Fix to get correct modname from elf header
samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
samples/bpf sock_example: Avoid getting ethhdr from two includes
perf sched timehist: Show total scheduling time
Linus Torvalds [Sun, 15 Jan 2017 18:54:39 +0000 (10:54 -0800)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"A number of regression fixes:
- Fix a boot hang on machines that have somewhat unusual memory map
entries of phys_addr=0x0 num_pages=0, which broke due to a recent
commit. This commit got cherry-picked from the v4.11 queue because
the bug is affecting real machines.
- Fix a boot hang also reported by KASAN, caused by incorrect init
ordering introduced by a recent optimization.
- Fix a recent robustification fix to allocate_new_fdt_and_exit_boot()
that introduced an invalid assumption. Neither bugs were seen in
the wild AFAIK"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Prune invalid memory map entries and fix boot regression
x86/efi: Don't allocate memmap through memblock after mm_init()
efi/libstub/arm*: Pass latest memory map to the kernel
Nikita Yushchenko [Wed, 11 Jan 2017 18:56:31 +0000 (21:56 +0300)]
swiotlb: ensure that page-sized mappings are page-aligned
Some drivers do depend on page mappings to be page aligned.
Swiotlb already enforces such alignment for mappings greater than page,
extend that to page-sized mappings as well.
Without this fix, nvme hits BUG() in nvme_setup_prps(), because that routine
assumes page-aligned mappings.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Linus Torvalds [Sun, 15 Jan 2017 01:13:28 +0000 (17:13 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.
The most notable fix here is probably the fix for a splice regression
("fix a fencepost error in pipe_advance()") noticed by Alan Wylie.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a fencepost error in pipe_advance()
coredump: Ensure proper size of sparse core files
aio: fix lock dep warning
tmpfs: clear S_ISGID when setting posix ACLs
Linus Torvalds [Sun, 15 Jan 2017 01:07:04 +0000 (17:07 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- the virtio_blk stack DMA corruption fix from Christoph, fixing and
issue with VMAP stacks.
- O_DIRECT blkbits calculation fix from Chandan.
- discard regression fix from Christoph.
- queue init error handling fixes for nbd and virtio_blk, from Omar and
Jeff.
- two small nvme fixes, from Christoph and Guilherme.
- rename of blk_queue_zone_size and bdev_zone_size to _sectors instead,
to more closely follow what we do in other places in the block layer.
This interface is new for this series, so let's get the naming right
before releasing a kernel with this feature. From Damien.
* 'for-linus' of git://git.kernel.dk/linux-block:
block: don't try to discard from __blkdev_issue_zeroout
sd: remove __data_len hack for WRITE SAME
nvme: use blk_rq_payload_bytes
scsi: use blk_rq_payload_bytes
block: add blk_rq_payload_bytes
block: Rename blk_queue_zone_size and bdev_zone_size
nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
nvme-rdma: fix nvme_rdma_queue_is_ready
virtio_blk: fix panic in initialization error path
nbd: blk_mq_init_queue returns an error code on failure, not NULL
virtio_blk: avoid DMA to stack for the sense buffer
do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned
Al Viro [Sun, 15 Jan 2017 00:33:08 +0000 (19:33 -0500)]
fix a fencepost error in pipe_advance()
The logics in pipe_advance() used to release all buffers past the new
position failed in cases when the number of buffers to release was equal
to pipe->buffers. If that happened, none of them had been released,
leaving pipe full. Worse, it was trivial to trigger and we end up with
pipe full of uninitialized pages. IOW, it's an infoleak.
Cc: stable@vger.kernel.org # v4.9
Reported-by: "Alan J. Wylie" <alan@wylie.me.uk>
Tested-by: "Alan J. Wylie" <alan@wylie.me.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dave Kleikamp [Wed, 11 Jan 2017 19:25:00 +0000 (13:25 -0600)]
coredump: Ensure proper size of sparse core files
If the last section of a core file ends with an unmapped or zero page,
the size of the file does not correspond with the last dump_skip() call.
gdb complains that the file is truncated and can be confusing to users.
After all of the vma sections are written, make sure that the file size
is no smaller than the current file position.
This problem can be demonstrated with gdb's bigcore testcase on the
sparc architecture.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Shaohua Li [Tue, 13 Dec 2016 20:09:56 +0000 (12:09 -0800)]
aio: fix lock dep warning
lockdep reports a warnning. file_start_write/file_end_write only
acquire/release the lock for regular files. So checking the files in aio
side too.
[ 453.532141] ------------[ cut here ]------------
[ 453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670
[ 453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0)
[ 453.533011] Modules linked in:
[ 453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964
[ 453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
[ 453.533011]
ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000
[ 453.533011]
ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008
[ 453.533011]
ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef
[ 453.533011] Call Trace:
[ 453.533011] [<
ffffffff8196cffb>] dump_stack+0x67/0x9c
[ 453.533011] [<
ffffffff81091ee1>] __warn+0x111/0x130
[ 453.533011] [<
ffffffff81091f97>] warn_slowpath_fmt+0x97/0xb0
[ 453.533011] [<
ffffffff81091f00>] ? __warn+0x130/0x130
[ 453.533011] [<
ffffffff8191b789>] ? blk_finish_plug+0x29/0x60
[ 453.533011] [<
ffffffff811205d4>] lock_release+0x434/0x670
[ 453.533011] [<
ffffffff8198af94>] ? import_single_range+0xd4/0x110
[ 453.533011] [<
ffffffff81322195>] ? rw_verify_area+0x65/0x140
[ 453.533011] [<
ffffffff813aa696>] ? aio_write+0x1f6/0x280
[ 453.533011] [<
ffffffff813aa6c9>] aio_write+0x229/0x280
[ 453.533011] [<
ffffffff813aa4a0>] ? aio_complete+0x640/0x640
[ 453.533011] [<
ffffffff8111df20>] ? debug_check_no_locks_freed+0x1a0/0x1a0
[ 453.533011] [<
ffffffff8114793a>] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30
[ 453.533011] [<
ffffffff81147985>] ? debug_lockdep_rcu_enabled+0x35/0x40
[ 453.533011] [<
ffffffff812a92be>] ? __might_fault+0x7e/0xf0
[ 453.533011] [<
ffffffff813ac9bc>] do_io_submit+0x94c/0xb10
[ 453.533011] [<
ffffffff813ac2ae>] ? do_io_submit+0x23e/0xb10
[ 453.533011] [<
ffffffff813ac070>] ? SyS_io_destroy+0x270/0x270
[ 453.533011] [<
ffffffff8111d7b3>] ? mark_held_locks+0x23/0xc0
[ 453.533011] [<
ffffffff8100201a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 453.533011] [<
ffffffff813acb90>] SyS_io_submit+0x10/0x20
[ 453.533011] [<
ffffffff824f96aa>] entry_SYSCALL_64_fastpath+0x18/0xad
[ 453.533011] [<
ffffffff81119190>] ? trace_hardirqs_off_caller+0xc0/0x110
[ 453.533011] ---[ end trace
b2fbe664d1cc0082 ]---
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 14 Jan 2017 19:09:24 +0000 (11:09 -0800)]
Merge tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"The fixes this time around are spread over drivers, pretty normal
update:
- PCI ID for SKL ioatdma, workaround for SKX and
ioat_alloc_chan_resources sleepy allocation fix
- dw kconfig typo fix
- null pointer deref for stm32
- MAINTAINERS Update for at_hdmac
- pl330 runtime pm fixes
- omap-dma port window fix
- rcar-dmac unmap slave resource fix"
* tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: rcar-dmac: unmap slave resource when channel is freed
dmaengine: omap-dma: Fix the port_window support
dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations.
dmaengine: pl330: Fix runtime PM support for terminated transfers
MAINTAINERS: dmaengine: Update + Hand over the at_hdmac driver to Ludovic
dmaengine: omap-dma: Fix dynamic lch_map allocation
dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
dmaengine: stm32-dma: Set correct args number for DMA request from DT
dmaengine: dw: fix typo in Kconfig
dmaengine: ioatdma: workaround SKX ioatdma version
dmaengine: ioatdma: Add Skylake PCI Dev ID
Peter Jones [Mon, 12 Dec 2016 23:42:28 +0000 (18:42 -0500)]
efi/x86: Prune invalid memory map entries and fix boot regression
Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
(2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
These machines fail to boot after the following commit,
commit
8e80632fb23f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Fix this by removing such bogus entries from the memory map.
Furthermore, currently the log output for this case (with efi=debug)
looks like:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
This is clearly wrong, and also not as informative as it could be. This
patch changes it so that if we find obviously invalid memory map
entries, we print an error and skip those entries. It also detects the
display of the address range calculation overflow, so the new output is:
[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0x0000000000000000] (invalid)
It also detects memory map sizes that would overflow the physical
address, for example phys_addr=0xfffffffffffff000 and
num_pages=0x0200000000000001, and prints:
[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
It then removes these entries from the memory map.
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
[Matt: Include bugzilla info in commit log]
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Greg Kroah-Hartman [Sat, 14 Jan 2017 13:09:03 +0000 (14:09 +0100)]
Revert "driver core: Add deferred_probe attribute to devices in sysfs"
This reverts commit
6751667a29d6fd64afb9ce30567ad616b68ed789.
Rob Herring objected to it, and a replacement for it will be added using
debugfs in the future.
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jiri Olsa [Tue, 3 Jan 2017 14:24:54 +0000 (15:24 +0100)]
perf/x86: Reject non sampling events with precise_ip
As Peter suggested [1] rejecting non sampling PEBS events,
because they dont make any sense and could cause bugs
in the NMI handler [2].
[1] http://lkml.kernel.org/r/
20170103094059.GC3093@worktop
[2] http://lkml.kernel.org/r/
1482931866-6018-3-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Wed, 28 Dec 2016 13:31:03 +0000 (14:31 +0100)]
perf/x86/intel: Account interrupts for PEBS errors
It's possible to set up PEBS events to get only errors and not
any data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:
taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10
This leads to a soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don't have a way to stop the event when it's over
the max_samples_per_tick limit:
NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
...
RIP: 0010:[<
ffffffff81159232>] [<
ffffffff81159232>] smp_call_function_single+0xe2/0x140
...
Call Trace:
? trace_hardirqs_on_caller+0xf5/0x1b0
? perf_cgroup_attach+0x70/0x70
perf_install_in_context+0x199/0x1b0
? ctx_resched+0x90/0x90
SYSC_perf_event_open+0x641/0xf90
SyS_perf_event_open+0x9/0x10
do_syscall_64+0x6c/0x1f0
entry_SYSCALL64_slow_path+0x25/0x25
Add perf_event_account_interrupt() which does the interrupt
and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
error path.
We keep the pending_kill and pending_wakeup logic only in the
__perf_event_overflow() path, because they make sense only if
there's any data to deliver.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 11 Jan 2017 20:09:50 +0000 (21:09 +0100)]
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.
The problem is exactly that described in commit:
f63a8daa5812 ("perf: Fix event->ctx locking")
... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.
That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.
So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).
Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.
Reported-by: Di Shen (Keen Lab)
Tested-by: John Dias <joaodias@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Min Chong <mchong@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes:
f63a8daa5812 ("perf: Fix event->ctx locking")
Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 9 Dec 2016 13:59:00 +0000 (14:59 +0100)]
perf/core: Fix sys_perf_event_open() vs. hotplug
There is problem with installing an event in a task that is 'stuck' on
an offline CPU.
Blocked tasks are not dis-assosciated from offlined CPUs, after all, a
blocked task doesn't run and doesn't require a CPU etc.. Only on
wakeup do we ammend the situation and place the task on a available
CPU.
If we hit such a task with perf_install_in_context() we'll loop until
either that task wakes up or the CPU comes back online, if the task
waking depends on the event being installed, we're stuck.
While looking into this issue, I also spotted another problem, if we
hit a task with perf_install_in_context() that is in the middle of
being migrated, that is we observe the old CPU before sending the IPI,
but run the IPI (on the old CPU) while the task is already running on
the new CPU, things also go sideways.
Rework things to rely on task_curr() -- outside of rq->lock -- which
is rather tricky. Imagine the following scenario where we're trying to
install the first event into our task 't':
CPU0 CPU1 CPU2
(current == t)
t->perf_event_ctxp[] = ctx;
smp_mb();
cpu = task_cpu(t);
switch(t, n);
migrate(t, 2);
switch(p, t);
ctx = t->perf_event_ctxp[]; // must not be NULL
smp_function_call(cpu, ..);
generic_exec_single()
func();
spin_lock(ctx->lock);
if (task_curr(t)) // false
add_event_to_ctx();
spin_unlock(ctx->lock);
perf_event_context_sched_in();
spin_lock(ctx->lock);
// sees event
So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'.
Because if CPU2's load of that variable were to observe NULL, it would
not try to schedule the ctx and we'd have a task running without its
counter, which would be 'bad'.
As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
first and not see the event yet, then CPU0 must observe task_curr()
and retry. If the install happens first, then we must see the event on
sched-in and all is well.
I think we can translate the first part (until the 'must not be NULL')
of the scenario to a litmus test like:
C C-peterz
{
}
P0(int *x, int *y)
{
int r1;
WRITE_ONCE(*x, 1);
smp_mb();
r1 = READ_ONCE(*y);
}
P1(int *y, int *z)
{
WRITE_ONCE(*y, 1);
smp_store_release(z, 1);
}
P2(int *x, int *z)
{
int r1;
int r2;
r1 = smp_load_acquire(z);
smp_mb();
r2 = READ_ONCE(*x);
}
exists
(0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
Where:
x is perf_event_ctxp[],
y is our tasks's CPU, and
z is our task being placed on the rq of CPU2.
The P0 smp_mb() is the one added by this patch, ordering the store to
perf_event_ctxp[] from find_get_context() and the load of task_cpu()
in task_function_call().
The smp_store_release/smp_load_acquire model the RCpc locking of the
rq->lock and the smp_mb() of P2 is the context switch switching from
whatever CPU2 was running to our task 't'.
This litmus test evaluates into:
Test C-peterz Allowed
States 7
0:r1=0; 2:r1=0; 2:r2=0;
0:r1=0; 2:r1=0; 2:r2=1;
0:r1=0; 2:r1=1; 2:r2=1;
0:r1=1; 2:r1=0; 2:r2=0;
0:r1=1; 2:r1=0; 2:r2=1;
0:r1=1; 2:r1=1; 2:r2=0;
0:r1=1; 2:r1=1; 2:r2=1;
No
Witnesses
Positive: 0 Negative: 7
Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
Observation C-peterz Never 0 7
Hash=
e427f41d9146b2a5445101d3e2fcaa34
And the strong and weak model agree.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: jeremy.linton@arm.com
Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tobias Klauser [Thu, 12 Jan 2017 15:53:11 +0000 (16:53 +0100)]
x86/mpx: Use compatible types in comparison to fix sparse error
info->si_addr is of type void __user *, so it should be compared against
something from the same address space.
This fixes the following sparse error:
arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces)
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Len Brown [Fri, 13 Jan 2017 06:11:18 +0000 (01:11 -0500)]
x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
The Intel Denverton microserver uses a 25 MHz TSC crystal,
so we can derive its exact [*] TSC frequency
using CPUID and some arithmetic, eg.:
TSC: 1800 MHz (
25000000 Hz * 216 / 3 / 1000000)
[*] 'exact' is only as good as the crystal, which should be +/- 20ppm
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sat, 14 Jan 2017 01:40:22 +0000 (17:40 -0800)]
Merge branch 'for-linus-4.10' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"These are all over the place.
The tracepoint part of the pull fixes a crash and adds a little more
information to two tracepoints, while the rest are good old fashioned
fixes"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: make tracepoint format strings more compact
Btrfs: add truncated_len for ordered extent tracepoints
Btrfs: add 'inode' for extent map tracepoint
btrfs: fix crash when tracepoint arguments are freed by wq callbacks
Btrfs: adjust outstanding_extents counter properly when dio write is split
Btrfs: fix lockdep warning about log_mutex
Btrfs: use down_read_nested to make lockdep silent
btrfs: fix locking when we put back a delayed ref that's too new
btrfs: fix error handling when run_delayed_extent_op fails
btrfs: return the actual error value from from btrfs_uuid_tree_iterate
Linus Torvalds [Sat, 14 Jan 2017 01:38:05 +0000 (17:38 -0800)]
Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two small fixups for the filesystem changes that went into this merge
window"
* tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
ceph: fix get_oldest_context()
ceph: fix mds cluster availability check
Linus Torvalds [Sat, 14 Jan 2017 01:35:43 +0000 (17:35 -0800)]
Merge tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- Cleanups and bug fixes for the mtty sample driver (Dan Carpenter)
- Export and make use of has_capability() to fix incorrect use of
ns_capable() for testing task capabilities (Jike Song)
* tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio:
vfio/type1: Remove pid_namespace.h include
vfio iommu type1: fix the testing of capability for remote task
capability: export has_capability
vfio-mdev: remove some dead code
vfio-mdev: buffer overflow in ioctl()
vfio-mdev: return -EFAULT if copy_to_user() fails
Linus Torvalds [Sat, 14 Jan 2017 01:06:24 +0000 (17:06 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
- fix for module unload vs deferred jump labels (note: there might be
other buggy modules!)
- two NULL pointer dereferences from syzkaller
- also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem
made worse during this merge window, "just" kernel memory leak on
releases
- fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix emulation of "MOV SS, null selector"
KVM: x86: fix NULL deref in vcpu_scan_ioapic
KVM: eventfd: fix NULL deref irqbypass consumer
KVM: x86: Introduce segmented_write_std
KVM: x86: flush pending lapic jump label updates on module unload
jump_labels: API for flushing deferred jump label updates
Linus Torvalds [Sat, 14 Jan 2017 01:00:42 +0000 (17:00 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix huge_ptep_set_access_flags() to return "changed" when any of the
ptes in the contiguous range is changed, not just the last one
- Fix the adr_l assembly macro to work in modules under KASLR
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: assembler: make adr_l work in modules under KASLR
arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags
Christoph Hellwig [Fri, 13 Jan 2017 22:18:16 +0000 (15:18 -0700)]
block: don't try to discard from __blkdev_issue_zeroout
Discard can return -EIO asynchronously if the alignment for the request
isn't suitable for the driver, which makes a proper fallback to other
methods in __blkdev_issue_zeroout impossible. Thus only issue a sync
discard from blkdev_issue_zeroout an don't try discard at all from
__blkdev_issue_zeroout as a non-invasive workaround.
One more reason why abusing discard for zeroing must die..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Eryu Guan <eguan@redhat.com>
Fixes:
e73c23ff ("block: add async variant of blkdev_issue_zeroout")
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Fri, 13 Jan 2017 11:29:13 +0000 (12:29 +0100)]
sd: remove __data_len hack for WRITE SAME
Now that we have the blk_rq_payload_bytes helper available to determine
the actual I/O size we don't need to mess around with __data_len for
WRITE SAME.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Fri, 13 Jan 2017 11:29:12 +0000 (12:29 +0100)]
nvme: use blk_rq_payload_bytes
The new blk_rq_payload_bytes generalizes the payload length hacks
that nvme_map_len did before.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Fri, 13 Jan 2017 11:29:11 +0000 (12:29 +0100)]
scsi: use blk_rq_payload_bytes
Without that we'll pass a wrong payload size in cmd->sdb, which
can lead to hangs with drivers that need the total transfer size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Chris Valean <v-chvale@microsoft.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Fixes:
f9d03f96 ("block: improve handling of the magic discard payload")
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Fri, 13 Jan 2017 11:29:10 +0000 (12:29 +0100)]
block: add blk_rq_payload_bytes
Add a helper to calculate the actual data transfer size for special
payload requests.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Linus Torvalds [Fri, 13 Jan 2017 20:38:36 +0000 (12:38 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The major fix is the bfa firmware, since the latest 10Gb cards fail
probing with the current firmware.
The rest is a set of minor fixes: one missed Kconfig dependency
causing randconfig failures, a missed error return on an error leg, a
change for how multiqueue waits on a blocked device and a don't reset
while in reset fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: bfa: Increase requested firmware version to 3.2.5.1
scsi: snic: Return error code on memory allocation failure
scsi: fnic: Avoid sending reset to firmware when another reset is in progress
scsi: qedi: fix build, depends on UIO
scsi: scsi-mq: Wait for .queue_rq() if necessary
Linus Torvalds [Fri, 13 Jan 2017 19:49:34 +0000 (11:49 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"Small driver fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data
Input: adxl34x - make it enumerable in ACPI environment
Input: ALPS - fix TrackStick Y axis handling for SS5 hardware
Input: synaptics-rmi4 - fix F03 build error when serio is module
Input: xpad - use correct product id for x360w controllers
Input: synaptics_i2c - change msleep to usleep_range for small msecs
Input: i8042 - add Pegatron touchpad to noloop table
Input: joydev - remove unused linux/miscdevice.h include
Trond Myklebust [Fri, 13 Jan 2017 18:31:32 +0000 (13:31 -0500)]
NFSv4: Fix client recovery when server reboots multiple times
If the server reboots multiple times, the client should rely on the
server to tell it that it cannot reclaim state as per section 9.6.3.4
in RFC7530 and section 8.4.2.1 in RFC5661.
Currently, the client is being to conservative, and is assuming that
if the server reboots while state recovery is in progress, then it must
ignore state that was not recovered before the reboot.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Shannon Nelson [Thu, 12 Jan 2017 22:24:58 +0000 (14:24 -0800)]
tcp: fix tcp_fastopen unaligned access complaints on sparc
Fix up a data alignment issue on sparc by swapping the order
of the cookie byte array field with the length field in
struct tcp_fastopen_cookie, and making it a proper union
to clean up the typecasting.
This addresses log complaints like these:
log_unaligned: 113 callbacks suppressed
Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Thu, 12 Jan 2017 20:30:01 +0000 (21:30 +0100)]
ipv6: sr: fix several BUGs when preemption is enabled
When CONFIG_PREEMPT=y, CONFIG_IPV6=m and CONFIG_SEG6_HMAC=y,
seg6_hmac_init() is called during the initialization of the ipv6 module.
This causes a subsequent call to smp_processor_id() with preemption
enabled, resulting in the following trace.
[ 20.451460] BUG: using smp_processor_id() in preemptible [
00000000] code: systemd/1
[ 20.452556] caller is debug_smp_processor_id+0x17/0x19
[ 20.453304] CPU: 0 PID: 1 Comm: systemd Not tainted 4.9.0-rc5-00973-g46738b1 #1
[ 20.454406]
ffffc9000062fc18 ffffffff813607b2 0000000000000000 ffffffff81a7f782
[ 20.455528]
ffffc9000062fc48 ffffffff813778dc 0000000000000000 00000000001dcf98
[ 20.456539]
ffffffffa003bd08 ffffffff81af93e0 ffffc9000062fc58 ffffffff81377905
[ 20.456539] Call Trace:
[ 20.456539] [<
ffffffff813607b2>] dump_stack+0x63/0x7f
[ 20.456539] [<
ffffffff813778dc>] check_preemption_disabled+0xd1/0xe3
[ 20.456539] [<
ffffffff81377905>] debug_smp_processor_id+0x17/0x19
[ 20.460260] [<
ffffffffa0061f3b>] seg6_hmac_init+0xfa/0x192 [ipv6]
[ 20.460260] [<
ffffffffa0061ccc>] seg6_init+0x39/0x6f [ipv6]
[ 20.460260] [<
ffffffffa006121a>] inet6_init+0x21a/0x321 [ipv6]
[ 20.460260] [<
ffffffffa0061000>] ? 0xffffffffa0061000
[ 20.460260] [<
ffffffff81000457>] do_one_initcall+0x8b/0x115
[ 20.460260] [<
ffffffff811328a3>] do_init_module+0x53/0x1c4
[ 20.460260] [<
ffffffff8110650a>] load_module+0x1153/0x14ec
[ 20.460260] [<
ffffffff81106a7b>] SYSC_finit_module+0x8c/0xb9
[ 20.460260] [<
ffffffff81106a7b>] ? SYSC_finit_module+0x8c/0xb9
[ 20.460260] [<
ffffffff81106abc>] SyS_finit_module+0x9/0xb
[ 20.460260] [<
ffffffff810014d1>] do_syscall_64+0x62/0x75
[ 20.460260] [<
ffffffff816834f0>] entry_SYSCALL64_slow_path+0x25/0x25
Moreover, dst_cache_* functions also call smp_processor_id(), generating
a similar trace.
This patch uses raw_cpu_ptr() in seg6_hmac_init() rather than this_cpu_ptr()
and disable preemption when using dst_cache_* functions.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 12 Jan 2017 20:09:09 +0000 (12:09 -0800)]
net: systemport: Decouple flow control from __bcm_sysport_tx_reclaim
The __bcm_sysport_tx_reclaim() function is used to reclaim transmit
resources in different places within the driver. Most of them should
not affect the state of the transit flow control.
Introduce bcm_sysport_tx_clean() which cleans the ring, but does not
re-enable flow control towards the networking stack, and make
bcm_sysport_tx_reclaim() do the actual transmit queue flow control.
Fixes:
80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Williamson [Thu, 12 Jan 2017 15:24:16 +0000 (08:24 -0700)]
vfio/type1: Remove pid_namespace.h include
Using has_capability() rather than ns_capable(), we're no longer using
this header.
Cc: Jike Song <jike.song@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Michal Kazior [Fri, 13 Jan 2017 12:32:51 +0000 (13:32 +0100)]
mac80211: prevent skb/txq mismatch
Station structure is considered as not uploaded
(to driver) until drv_sta_state() finishes. This
call is however done after the structure is
attached to mac80211 internal lists and hashes.
This means mac80211 can lookup (and use) station
structure before it is uploaded to a driver.
If this happens (structure exists, but
sta->uploaded is false) fast_tx path can still be
taken. Deep in the fastpath call the sta->uploaded
is checked against to derive "pubsta" argument for
ieee80211_get_txq(). If sta->uploaded is false
(and sta is actually non-NULL) ieee80211_get_txq()
effectively downgraded to vif->txq.
At first glance this may look innocent but coerces
mac80211 into a state that is almost guaranteed
(codel may drop offending skb) to crash because a
station-oriented skb gets queued up on
vif-oriented txq. The ieee80211_tx_dequeue() ends
up looking at info->control.flags and tries to use
txq->sta which in the fail case is NULL.
It's probably pointless to pretend one can
downgrade skb from sta-txq to vif-txq.
Since downgrading unicast traffic to vif->txq must
not be done there's no txq to put a frame on if
sta->uploaded is false. Therefore the code is made
to fall back to regular tx() op path if the
described condition is hit.
Only drivers using wake_tx_queue were affected.
Example crash dump before fix:
Unable to handle kernel paging request at virtual address
ffffe26c
PC is at ieee80211_tx_dequeue+0x204/0x690 [mac80211]
[<
bf4252a4>] (ieee80211_tx_dequeue [mac80211]) from
[<
bf4b1388>] (ath10k_mac_tx_push_txq+0x54/0x1c0 [ath10k_core])
[<
bf4b1388>] (ath10k_mac_tx_push_txq [ath10k_core]) from
[<
bf4bdfbc>] (ath10k_htt_txrx_compl_task+0xd78/0x11d0 [ath10k_core])
[<
bf4bdfbc>] (ath10k_htt_txrx_compl_task [ath10k_core])
[<
bf51c5a4>] (ath10k_pci_napi_poll+0x54/0xe8 [ath10k_pci])
[<
bf51c5a4>] (ath10k_pci_napi_poll [ath10k_pci]) from
[<
c0572e90>] (net_rx_action+0xac/0x160)
Reported-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Felix Fietkau [Fri, 13 Jan 2017 10:28:25 +0000 (11:28 +0100)]
mac80211: initialize SMPS field in HT capabilities
ibss and mesh modes copy the ht capabilites from the band without
overriding the SMPS state. Unfortunately the default value 0 for the
SMPS field means static SMPS instead of disabled.
This results in HT ibss and mesh setups using only single-stream rates,
even though SMPS is not supposed to be active.
Initialize SMPS to disabled for all bands on ieee80211_hw_register to
ensure that the value is sane where it is not overriden with the real
SMPS state.
Reported-by: Elektra Wagenrad <onelektra@gmx.net>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[move VHT TODO comment to a better place]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Niklas Söderlund [Wed, 11 Jan 2017 14:39:31 +0000 (15:39 +0100)]
dmaengine: rcar-dmac: unmap slave resource when channel is freed
The slave mapping should be removed together with other channel
resources when the channel is freed. If it's not unmapped it will hang
around forever after the channel is freed.
Fixes:
9f878603dbdb7db3 ("dmaengine: rcar-dmac: add iommu support for slave transfers")
Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Jike Song [Thu, 12 Jan 2017 08:52:03 +0000 (16:52 +0800)]
vfio iommu type1: fix the testing of capability for remote task
Before the mdev enhancement type1 iommu used capable() to test the
capability of current task; in the course of mdev development a
new requirement, testing for another task other than current, was
raised. ns_capable() was used for this purpose, however it still
tests current, the only difference is, in a specified namespace.
Fix it by using has_capability() instead, which tests the cap for
specified task in init_user_ns, the same namespace as capable().
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jike Song <jike.song@intel.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Linus Torvalds [Thu, 12 Jan 2017 22:45:59 +0000 (14:45 -0800)]
Merge tag 'sound-4.10-rc4' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This time we got a few more fixes than the previous rc's, and most of
commits were about ASoC.
The only significant change in the core side is the regression fix wrt
the aux device list handling, and all the rest are driver-specific
small / trivial fixes"
* tag 'sound-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add a quirk for Plantronics BT600
ASoC: rt5645: set sel_i2s_pre_div1 to 2
ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
ASoC: Intel: Skylake: Release FW ctx in cleanup
ASoC: Intel: bytcr-rt5640: fix settings in internal clock mode
ASoC: fsl_ssi: set fifo watermark to more reliable value
ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
ASoC: nau8825: correct the function name of register
ASoC: Intel: Skylake: Fix to fail safely if module not available in path
ASoC: tlv320aic3x: Mark the RESET register as volatile
ASoC: Fix binding and probing of auxiliary components
ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data
ASoC: Intel: bytcr_rt5640: fallback mechanism if MCLK is not enabled
ASoC: hdmi-codec: use unsigned type to structure members with bit-field
ASoC: topology: kfree kcontrol->private_value before freeing kcontrol
ASoC: rsnd: don't double free kctrl
ASoC: dwc: Fix PIO mode initialization
Vadim Lomovtsev [Thu, 12 Jan 2017 15:28:06 +0000 (07:28 -0800)]
net: thunderx: acpi: fix LMAC initialization
While probing BGX we requesting appropriate QLM for it's configuration
and get LMAC count by that request. Then, while reading configured
MAC values from SSDT table we need to save them in proper mapping:
BGX[i]->lmac[j].mac = <MAC value>
to later provide for initialization stuff. In order to fill
such mapping properly we need to add lmac index to be used while
acpi initialization since at this moment bgx->lmac_count already contains
actual value.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sriharsha Basavapatna [Mon, 9 Jan 2017 10:30:44 +0000 (16:00 +0530)]
svcrdma: avoid duplicate dma unmapping during error recovery
In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path
invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes
svc_rdma_put_frmr() which in turn tries to unmap the same sg list through
ib_dma_unmap_sg() again. This second unmap is invalid and could lead to
problems when the iova being unmapped is subsequently reused. Remove
the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr()
handle it.
Fixes:
412a15c0fe53 ("svcrdma: Port to new memory registration API")
Cc: stable@vger.kernel.org
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Stefan Schmidt [Mon, 2 Jan 2017 15:58:13 +0000 (16:58 +0100)]
ieee802154: atusb: fix driver to work with older firmware versions
After the addition of the frame_retries callback we could run into cases where
a ATUSB device with an older firmware version would now longer be able to bring
the interface up.
We keep this functionality disabled now if the minimum firmware version for this
feature is not available.
Fixes:
5d82288b93db3bc ("ieee802154: atusb: implement .set_frame_retries
ops callback")
Reported-by: Alexander Aring <aar@pengutronix.de>
Acked-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Andrey Smirnov [Sun, 18 Dec 2016 23:25:33 +0000 (15:25 -0800)]
at86rf230: Allow slow GPIO pins for "rstn"
Driver code never touches "rstn" signal in atomic context, so there's
no need to implicitly put such restriction on it by using gpio_set_value
to manipulate it. Replace gpio_set_value to gpio_set_value_cansleep to
fix that.
As a an example of where such restriction might be inconvenient,
consider a hardware design where "rstn" is connected to a pin of I2C/SPI
GPIO expander chip.
Cc: Chris Healy <cphealy@gmail.com>
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>