David S. Miller [Fri, 7 Feb 2020 10:36:22 +0000 (11:36 +0100)]
Merge branch 'stmmac-fixes'
Ong Boon Leong says:
====================
net: stmmac: general fixes for Ethernet functionality
1/5: It ensures that the previous value of GMAC_VLAN_TAG register is
read first before for updating the register.
2/5: Similar to 2/6 patch but it is a fix for XGMAC_VLAN_TAG register
as requested by Jose Abreu.
3/5: It ensures the GMAC IP v4.xx and above behaves correctly to:-
ip link set <devname> multicast off|on
4/5: Added similar IFF_MULTICAST flag for xgmac2, similar to 4/6.
5/5: It ensures PCI platform data is using plat->phy_interface.
Changes from v4:-
patch 1/6 - this patch is dropped now and will take the input on
handling return value from netif_set_real_num_rx|
tx_queues() in future patch series.
v3:-
patch 1/6 - add rtnl_lock() and rtnl_unlock() for stmmac_hw_setup()
called inside stmmac_resume()
patch 3/6 - Added new patch to fix XGMAC_VLAN_TAG register writting
v2:-
patch 1/5 - added control for rtnl_lock() & rtnl_unlock() to ensure
they are used forstmmac_resume()
patch 4/5 - added IFF_MULTICAST flag check for xgmac to ensure
multicast works correctly.
v1:-
- Drop v1 patches (1/7, 3/7 & 4/7) that are not valid.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Voon Weifeng [Fri, 7 Feb 2020 07:34:28 +0000 (15:34 +0800)]
net: stmmac: update pci platform data to use phy_interface
The recent patch to support passive mode converter did not take care the
phy interface configuration in PCI platform data. Hence, converting all
the PCI platform data from plat->interface to plat->phy_interface as the
default mode is meant for PHY.
Fixes:
0060c8783330 ("net: stmmac: implement support for passive mode converters via dt")
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Tested-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tan, Tee Min [Fri, 7 Feb 2020 07:34:15 +0000 (15:34 +0800)]
net: stmmac: xgmac: fix missing IFF_MULTICAST checki in dwxgmac2_set_filter
Without checking for IFF_MULTICAST flag, it is wrong to assume multicast
filtering is always enabled. By checking against IFF_MULTICAST, now
the driver behaves correctly when the multicast support is toggled by below
command:-
ip link set <devname> multicast off|on
Fixes:
0efedbf11f07a ("net: stmmac: xgmac: Fix XGMAC selftests")
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Verma, Aashish [Fri, 7 Feb 2020 07:33:54 +0000 (15:33 +0800)]
net: stmmac: fix missing IFF_MULTICAST check in dwmac4_set_filter
Without checking for IFF_MULTICAST flag, it is wrong to assume multicast
filtering is always enabled. By checking against IFF_MULTICAST, now
the driver behaves correctly when the multicast support is toggled by below
command:-
ip link set <devname> multicast off|on
Fixes:
477286b53f55 ("stmmac: add GMAC4 core support")
Signed-off-by: Verma, Aashish <aashishx.verma@intel.com>
Tested-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Fri, 7 Feb 2020 07:33:40 +0000 (15:33 +0800)]
net: stmmac: xgmac: fix incorrect XGMAC_VLAN_TAG register writting
We should always do a read of current value of XGMAC_VLAN_TAG instead of
directly overwriting the register value.
Fixes:
3cd1cfcba26e2 ("net: stmmac: Implement VLAN Hash Filtering in XGMAC")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tan, Tee Min [Fri, 7 Feb 2020 07:33:20 +0000 (15:33 +0800)]
net: stmmac: fix incorrect GMAC_VLAN_TAG register writting in GMAC4+
It should always do a read of current value of GMAC_VLAN_TAG instead of
directly overwriting the register value.
Fixes:
c1be0022df0d ("net: stmmac: Add VLAN HASH filtering support in GMAC4+")
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Thu, 6 Feb 2020 22:01:05 +0000 (14:01 -0800)]
hv_netvsc: Fix XDP refcnt for synthetic and VF NICs
The caller of XDP_SETUP_PROG has already incremented refcnt in
__bpf_prog_get(), so drivers should only increment refcnt by
num_queues - 1.
To fix the issue, update netvsc_xdp_set() to add the correct number
to refcnt.
Hold a refcnt in netvsc_xdp_set()’s other caller, netvsc_attach().
And, do the same in netvsc_vf_setxdp(). Otherwise, every time when VF is
removed and added from the host side, the refcnt will be decreased by one,
which may cause page fault when unloading xdp program.
Fixes:
351e1581395f ("hv_netvsc: Add XDP support")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Feb 2020 10:30:03 +0000 (11:30 +0100)]
Merge branch 'taprio-Some-fixes'
Vinicius Costa Gomes says:
====================
taprio: Some fixes
Changes from v3:
- Replaced ENOTSUPP error code with EOPNOTSUPP (Jakub Kicinski);
- Added the missing policy validation for the flags netlink argument
(Jakub Kicinski);
- Fixed the destroy() flow to also destroy the priority to traffic
class mapping (David Miller);
- Fixed dropping packets when taprio offloading is used together
with ETF offloading (more on this below);
Changes from v2:
- Squashed commits 2/3 and 3/3 into a single one (I think a single
commit is going to be easier to review);
- Removed an "improvement" that was causing changes in user visible
behavior;
Changes from v1:
- Fixed ignoring the 'flags' argument when adding a new
instance (Vladimir Oltean);
- Changed the order of commits;
Updated cover letter:
One bit that might need some attention is the fix for not dropping all
packets when taprio and ETF offloading are used, patch 5/5. The
behavior when the fix is applied is that packets that have a 'txtime'
that would fall outside of their transmission window are now dropped
by taprio. The question that might be raised is: should taprio be
responsible for dropping these packets, or should it be handled lower
in the stack?
My opinion is: taprio has all the information, and it's able to give
feeback to the user. Lower in the stack, those packets might go into
the void, and the only feedback could be a hard to find counter
increasing.
Patch 1/5: Reported by Po Liu, is more of a improvement of usability for
drivers implementing offloading features, now they can rely on the
value of dev->num_tc, instead of going through some hops to get this
value.
Patch 2/5: Use 'q->flags' as the source of truth for the offloading
flags. Tries to solidify the current behavior, while avoiding going
into invalid states, one of which was causing a "rcu stall" (more
information in the commit message).
Patch 3/5: Adds the missing netlink attribute validation for
TCA_TAPRIO_ATTR_FLAGS.
Patch 4/5: Replaces the usage of netdev_set_num_tc() with
netdev_reset_tc() in taprio_destroy(), taprio_destroy() is called when
applying a configuration fails, making sure that the device traffic
class configuration goes back to the default state.
@Vladimir: If possible, I would appreciate your Ack on patch 2/5. I
have been looking at this code for so long that I might have missed
something obvious (and my growing dislike for the word 'flags' may be
affecting my judgement :-).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:10 +0000 (13:46 -0800)]
taprio: Fix dropping packets when using taprio + ETF offloading
When using taprio offloading together with ETF offloading, configured
like this, for example:
$ tc qdisc replace dev $IFACE parent root handle 100 taprio \
num_tc 4 \
map 2 2 1 0 3 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 1@2 1@3 \
base-time $BASE_TIME \
sched-entry S 01 1000000 \
sched-entry S 0e 1000000 \
flags 0x2
$ tc qdisc replace dev $IFACE parent 100:1 etf \
offload delta 300000 clockid CLOCK_TAI
During enqueue, it works out that the verification added for the
"txtime" assisted mode is run when using taprio + ETF offloading, the
only thing missing is initializing the 'next_txtime' of all the cycle
entries. (if we don't set 'next_txtime' all packets from SO_TXTIME
sockets are dropped)
Fixes:
4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:09 +0000 (13:46 -0800)]
taprio: Use taprio_reset_tc() to reset Traffic Classes configuration
When destroying the current taprio instance, which can happen when the
creation of one fails, we should reset the traffic class configuration
back to the default state.
netdev_reset_tc() is a better way because in addition to setting the
number of traffic classes to zero, it also resets the priority to
traffic classes mapping to the default value.
Fixes:
5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:08 +0000 (13:46 -0800)]
taprio: Add missing policy validation for flags
netlink policy validation for the 'flags' argument was missing.
Fixes:
4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:07 +0000 (13:46 -0800)]
taprio: Fix still allowing changing the flags during runtime
Because 'q->flags' starts as zero, and zero is a valid value, we
aren't able to detect the transition from zero to something else
during "runtime".
The solution is to initialize 'q->flags' with an invalid value, so we
can detect if 'q->flags' was set by the user or not.
To better solidify the behavior, 'flags' handling is moved to a
separate function. The behavior is:
- 'flags' if unspecified by the user, is assumed to be zero;
- 'flags' cannot change during "runtime" (i.e. a change() request
cannot modify it);
With this new function we can remove taprio_flags, which should reduce
the risk of future accidents.
Allowing flags to be changed was causing the following RCU stall:
[ 1730.558249] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1730.558258] rcu: 6-...0: (190 ticks this GP) idle=922/0/0x1 softirq=25580/25582 fqs=16250
[ 1730.558264] (detected by 2, t=65002 jiffies, g=33017, q=81)
[ 1730.558269] Sending NMI from CPU 2 to CPUs 6:
[ 1730.559277] NMI backtrace for cpu 6
[ 1730.559277] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G E 5.5.0-rc6+ #35
[ 1730.559278] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA/Z390 AORUS ULTRA-CF, BIOS F7 03/14/2019
[ 1730.559278] RIP: 0010:__hrtimer_run_queues+0xe2/0x440
[ 1730.559278] Code: 48 8b 43 28 4c 89 ff 48 8b 75 c0 48 89 45 c8 e8 f4 bb 7c 00 0f 1f 44 00 00 65 8b 05 40 31 f0 68 89 c0 48 0f a3 05 3e 5c 25 01 <0f> 82 fc 01 00 00 48 8b 45 c8 48 89 df ff d0 89 45 c8 0f 1f 44 00
[ 1730.559279] RSP: 0018:
ffff9970802d8f10 EFLAGS:
00000083
[ 1730.559279] RAX:
0000000000000006 RBX:
ffff8b31645bff38 RCX:
0000000000000000
[ 1730.559280] RDX:
0000000000000000 RSI:
ffffffff9710f2ec RDI:
ffffffff978daf0e
[ 1730.559280] RBP:
ffff9970802d8f68 R08:
0000000000000000 R09:
0000000000000000
[ 1730.559280] R10:
0000018336d7944e R11:
0000000000000001 R12:
ffff8b316e39f9c0
[ 1730.559281] R13:
ffff8b316e39f940 R14:
ffff8b316e39f998 R15:
ffff8b316e39f7c0
[ 1730.559281] FS:
0000000000000000(0000) GS:
ffff8b316e380000(0000) knlGS:
0000000000000000
[ 1730.559281] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1730.559281] CR2:
00007f1105303760 CR3:
0000000227210005 CR4:
00000000003606e0
[ 1730.559282] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 1730.559282] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 1730.559282] Call Trace:
[ 1730.559282] <IRQ>
[ 1730.559283] ? taprio_dequeue_soft+0x2d0/0x2d0 [sch_taprio]
[ 1730.559283] hrtimer_interrupt+0x104/0x220
[ 1730.559283] ? irqtime_account_irq+0x34/0xa0
[ 1730.559283] smp_apic_timer_interrupt+0x6d/0x230
[ 1730.559284] apic_timer_interrupt+0xf/0x20
[ 1730.559284] </IRQ>
[ 1730.559284] RIP: 0010:cpu_idle_poll+0x35/0x1a0
[ 1730.559285] Code: 88 82 ff 65 44 8b 25 12 7d 73 68 0f 1f 44 00 00 e8 90 c3 89 ff fb 65 48 8b 1c 25 c0 7e 01 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 be a8 a8 00 85 c0 75 ed e8 75 48 84 ff
[ 1730.559285] RSP: 0018:
ffff997080137ea8 EFLAGS:
00000202 ORIG_RAX:
ffffffffffffff13
[ 1730.559285] RAX:
0000000000000001 RBX:
ffff8b316bc3c580 RCX:
0000000000000000
[ 1730.559286] RDX:
0000000000000001 RSI:
000000002819aad9 RDI:
ffffffff978da730
[ 1730.559286] RBP:
ffff997080137ec0 R08:
0000018324a6d387 R09:
0000000000000000
[ 1730.559286] R10:
0000000000000400 R11:
0000000000000001 R12:
0000000000000006
[ 1730.559286] R13:
ffff8b316bc3c580 R14:
0000000000000000 R15:
0000000000000000
[ 1730.559287] ? cpu_idle_poll+0x20/0x1a0
[ 1730.559287] ? cpu_idle_poll+0x20/0x1a0
[ 1730.559287] do_idle+0x4d/0x1f0
[ 1730.559287] ? complete+0x44/0x50
[ 1730.559288] cpu_startup_entry+0x1b/0x20
[ 1730.559288] start_secondary+0x142/0x180
[ 1730.559288] secondary_startup_64+0xb6/0xc0
[ 1776.686313] nvme nvme0: I/O 96 QID 1 timeout, completion polled
Fixes:
4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:06 +0000 (13:46 -0800)]
taprio: Fix enabling offload with wrong number of traffic classes
If the driver implementing taprio offloading depends on the value of
the network device number of traffic classes (dev->num_tc) for
whatever reason, it was going to receive the value zero. The value was
only set after the offloading function is called.
So, moving setting the number of traffic classes to before the
offloading function is called fixes this issue. This is safe because
this only happens when taprio is instantiated (we don't allow this
configuration to be changed without first removing taprio).
Fixes:
9c66d1564676 ("taprio: Add support for hardware offloading")
Reported-by: Po Liu <po.liu@nxp.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 6 Feb 2020 19:23:52 +0000 (11:23 -0800)]
net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port
The 7445 switch clocking profiles do not allow us to run the IMP port at
2Gb/sec in a way that it is reliable and consistent. Make sure that the
setting is only applied to the 7278 family.
Fixes:
8f1880cbe8d0 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 6 Feb 2020 19:07:45 +0000 (11:07 -0800)]
net: dsa: b53: Always use dev->vlan_enabled in b53_configure_vlan()
b53_configure_vlan() is called by the bcm_sf2 driver upon setup and
indirectly through resume as well. During the initial setup, we are
guaranteed that dev->vlan_enabled is false, so there is no change in
behavior, however during suspend, we may have enabled VLANs before, so we
do want to restore that setting.
Fixes:
dad8d7c6452b ("net: dsa: b53: Properly account for VLAN filtering")
Fixes:
967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dejin Zheng [Thu, 6 Feb 2020 15:29:17 +0000 (23:29 +0800)]
net: stmmac: fix a possible endless loop
It forgot to reduce the value of the variable retry in a while loop
in the ethqos_configure() function. It may cause an endless loop and
without timeout.
Fixes:
a7c30e62d4b8 ("net: stmmac: Add driver for Qualcomm ethqos")
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 6 Feb 2020 13:57:40 +0000 (13:57 +0000)]
rxrpc: Fix call RCU cleanup using non-bh-safe locks
rxrpc_rcu_destroy_call(), which is called as an RCU callback to clean up a
put call, calls rxrpc_put_connection() which, deep in its bowels, takes a
number of spinlocks in a non-BH-safe way, including rxrpc_conn_id_lock and
local->client_conns_lock. RCU callbacks, however, are normally called from
softirq context, which can cause lockdep to notice the locking
inconsistency.
To get lockdep to detect this, it's necessary to have the connection
cleaned up on the put at the end of the last of its calls, though normally
the clean up is deferred. This can be induced, however, by starting a call
on an AF_RXRPC socket and then closing the socket without reading the
reply.
Fix this by having rxrpc_rcu_destroy_call() punt the destruction to a
workqueue if in softirq-mode and defer the destruction to process context.
Note that another way to fix this could be to add a bunch of bh-disable
annotations to the spinlocks concerned - and there might be more than just
those two - but that means spending more time with BHs disabled.
Note also that some of these places were covered by bh-disable spinlocks
belonging to the rxrpc_transport object, but these got removed without the
_bh annotation being retained on the next lock in.
Fixes:
999b69f89241 ("rxrpc: Kill the client connection bundle concept")
Reported-by: syzbot+d82f3ac8d87e7ccbb2c9@syzkaller.appspotmail.com
Reported-by: syzbot+3f1fd6b8cbf8702d134e@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 6 Feb 2020 13:55:01 +0000 (13:55 +0000)]
rxrpc: Fix service call disconnection
The recent patch that substituted a flag on an rxrpc_call for the
connection pointer being NULL as an indication that a call was disconnected
puts the set_bit in the wrong place for service calls. This is only a
problem if a call is implicitly terminated by a new call coming in on the
same connection channel instead of a terminating ACK packet.
In such a case, rxrpc_input_implicit_end_call() calls
__rxrpc_disconnect_call(), which is now (incorrectly) setting the
disconnection bit, meaning that when rxrpc_release_call() is later called,
it doesn't call rxrpc_disconnect_call() and so the call isn't removed from
the peer's error distribution list and the list gets corrupted.
KASAN finds the issue as an access after release on a call, but the
position at which it occurs is confusing as it appears to be related to a
different call (the call site is where the latter call is being removed
from the error distribution list and either the next or pprev pointer
points to a previously released call).
Fix this by moving the setting of the flag from __rxrpc_disconnect_call()
to rxrpc_disconnect_call() in the same place that the connection pointer
was being cleared.
Fixes:
5273a191dca6 ("rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Fri, 7 Feb 2020 10:01:40 +0000 (11:01 +0100)]
Merge branches 'pm-avs' and 'pm-cpuidle'
* pm-avs:
power: avs: qcom-cpr: Avoid clang -Wsometimes-uninitialized in cpr_scale
power: avs: qcom-cpr: add unspecified HAS_IOMEM dependency
PM / AVS: rockchip-io: fix the supply naming for the emmc supply on px30
power: avs: qcom-cpr: add a printout after the driver has been initialized
* pm-cpuidle:
cpuidle: Documentation: Clean up PM QoS description
intel_idle: Introduce 'states_off' module parameter
intel_idle: Introduce 'use_acpi' module parameter
Damien Le Moal [Wed, 25 Dec 2019 07:11:09 +0000 (16:11 +0900)]
zonefs: Add documentation
Add the new file Documentation/filesystems/zonefs.txt to document
zonefs principles and user-space tool usage.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Damien Le Moal [Wed, 25 Dec 2019 07:07:44 +0000 (16:07 +0900)]
fs: New zonefs file system
zonefs is a very simple file system exposing each zone of a zoned block
device as a file. Unlike a regular file system with zoned block device
support (e.g. f2fs), zonefs does not hide the sequential write
constraint of zoned block devices to the user. Files representing
sequential write zones of the device must be written sequentially
starting from the end of the file (append only writes).
As such, zonefs is in essence closer to a raw block device access
interface than to a full featured POSIX file system. The goal of zonefs
is to simplify the implementation of zoned block device support in
applications by replacing raw block device file accesses with a richer
file API, avoiding relying on direct block device file ioctls which may
be more obscure to developers. One example of this approach is the
implementation of LSM (log-structured merge) tree structures (such as
used in RocksDB and LevelDB) on zoned block devices by allowing SSTables
to be stored in a zone file similarly to a regular file system rather
than as a range of sectors of a zoned device. The introduction of the
higher level construct "one file is one zone" can help reducing the
amount of changes needed in the application as well as introducing
support for different application programming languages.
Zonefs on-disk metadata is reduced to an immutable super block to
persistently store a magic number and optional feature flags and
values. On mount, zonefs uses blkdev_report_zones() to obtain the device
zone configuration and populates the mount point with a static file tree
solely based on this information. E.g. file sizes come from the device
zone type and write pointer offset managed by the device itself.
The zone files created on mount have the following characteristics.
1) Files representing zones of the same type are grouped together
under a common sub-directory:
* For conventional zones, the sub-directory "cnv" is used.
* For sequential write zones, the sub-directory "seq" is used.
These two directories are the only directories that exist in zonefs.
Users cannot create other directories and cannot rename nor delete
the "cnv" and "seq" sub-directories.
2) The name of zone files is the number of the file within the zone
type sub-directory, in order of increasing zone start sector.
3) The size of conventional zone files is fixed to the device zone size.
Conventional zone files cannot be truncated.
4) The size of sequential zone files represent the file's zone write
pointer position relative to the zone start sector. Truncating these
files is allowed only down to 0, in which case, the zone is reset to
rewind the zone write pointer position to the start of the zone, or
up to the zone size, in which case the file's zone is transitioned
to the FULL state (finish zone operation).
5) All read and write operations to files are not allowed beyond the
file zone size. Any access exceeding the zone size is failed with
the -EFBIG error.
6) Creating, deleting, renaming or modifying any attribute of files and
sub-directories is not allowed.
7) There are no restrictions on the type of read and write operations
that can be issued to conventional zone files. Buffered, direct and
mmap read & write operations are accepted. For sequential zone files,
there are no restrictions on read operations, but all write
operations must be direct IO append writes. mmap write of sequential
files is not allowed.
Several optional features of zonefs can be enabled at format time.
* Conventional zone aggregation: ranges of contiguous conventional
zones can be aggregated into a single larger file instead of the
default one file per zone.
* File ownership: The owner UID and GID of zone files is by default 0
(root) but can be changed to any valid UID/GID.
* File access permissions: the default 640 access permissions can be
changed.
The mkzonefs tool is used to format zoned block devices for use with
zonefs. This tool is available on Github at:
git@github.com:damien-lemoal/zonefs-tools.git.
zonefs-tools also includes a test suite which can be run against any
zoned block device, including null_blk block device created with zoned
mode.
Example: the following formats a 15TB host-managed SMR HDD with 256 MB
zones with the conventional zones aggregation feature enabled.
$ sudo mkzonefs -o aggr_cnv /dev/sdX
$ sudo mount -t zonefs /dev/sdX /mnt
$ ls -l /mnt/
total 0
dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
The size of the zone files sub-directories indicate the number of files
existing for each type of zones. In this example, there is only one
conventional zone file (all conventional zones are aggregated under a
single file).
$ ls -l /mnt/cnv
total
137101312
-rw-r----- 1 root root
140391743488 Nov 25 13:23 0
This aggregated conventional zone file can be used as a regular file.
$ sudo mkfs.ext4 /mnt/cnv/0
$ sudo mount -o loop /mnt/cnv/0 /data
The "seq" sub-directory grouping files for sequential write zones has
in this example 55356 zones.
$ ls -lv /mnt/seq
total
14511243264
-rw-r----- 1 root root 0 Nov 25 13:23 0
-rw-r----- 1 root root 0 Nov 25 13:23 1
-rw-r----- 1 root root 0 Nov 25 13:23 2
...
-rw-r----- 1 root root 0 Nov 25 13:23 55354
-rw-r----- 1 root root 0 Nov 25 13:23 55355
For sequential write zone files, the file size changes as data is
appended at the end of the file, similarly to any regular file system.
$ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.
000452219 s, 9.1 MB/s
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
The written file can be truncated to the zone size, preventing any
further write operation.
$ truncate -s
268435456 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root
268435456 Nov 25 13:49 /mnt/seq/0
Truncation to 0 size allows freeing the file zone storage space and
restart append-writes to the file.
$ truncate -s 0 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
Since files are statically mapped to zones on the disk, the number of
blocks of a file as reported by stat() and fstat() indicates the size
of the file zone.
$ stat /mnt/seq/0
File: /mnt/seq/0
Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
Device: 870h/2160d Inode: 50431 Links: 1
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-11-25 13:23:57.
048971997 +0900
Modify: 2019-11-25 13:52:25.
553805765 +0900
Change: 2019-11-25 13:52:25.
553805765 +0900
Birth: -
The number of blocks of the file ("Blocks") in units of 512B blocks
gives the maximum file size of 524288 * 512 B = 256 MB, corresponding
to the device zone size in this example. Of note is that the "IO block"
field always indicates the minimum IO size for writes and corresponds
to the device physical sector size.
This code contains contributions from:
* Johannes Thumshirn <jthumshirn@suse.de>,
* Darrick J. Wong <darrick.wong@oracle.com>,
* Christoph Hellwig <hch@lst.de>,
* Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and
* Ting Yao <tingyao@hust.edu.cn>.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Al Viro [Mon, 16 Dec 2019 18:33:32 +0000 (13:33 -0500)]
fold struct fs_parameter_enum into struct constant_table
no real difference now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 7 Sep 2019 02:12:08 +0000 (22:12 -0400)]
fs_parse: get rid of ->enums
Don't do a single array; attach them to fsparam_enum() entry
instead. And don't bother trying to embed the names into those -
it actually loses memory, with no real speedup worth mentioning.
Simplifies validation as well.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 17 Dec 2019 19:15:04 +0000 (14:15 -0500)]
Pass consistent param->type to fs_parse()
As it is, vfs_parse_fs_string() makes "foo" and "foo=" indistinguishable;
both get fs_value_is_string for ->type and NULL for ->string. To make
it even more unpleasant, that combination is impossible to produce with
fsconfig().
Much saner rules would be
"foo" => fs_value_is_flag, NULL
"foo=" => fs_value_is_string, ""
"foo=bar" => fs_value_is_string, "bar"
All cases are distinguishable, all results are expressable by fsconfig(),
->has_value checks are much simpler that way (to the point of the field
being useless) and quite a few regressions go away (gfs2 has no business
accepting -o nodebug=, for example).
Partially based upon patches from Miklos.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dave Airlie [Fri, 7 Feb 2020 02:29:35 +0000 (12:29 +1000)]
Merge tag 'amd-drm-next-5.6-2020-02-05' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.6-2020-02-05:
amdgpu:
- EDC fixes for Arcturus
- GDDR6 memory training fixe
- Fix for reading gfx clockgating registers while in GFXOFF state
- i2c freq fixes
- Misc display fixes
- TLB invalidation fix when using semaphores
- VCN 2.5 instancing fixes
- Switch raven1 gfxoff to a blacklist
- Coreboot workaround for KV/KB
- Root cause dongle fixes for display and revert workaround
- Enable GPU reset for renoir and navi
- Navi overclocking fixes
- Fix up confusing warnings in display clock validation on raven
amdkfd:
- SDMA fix
radeon:
- Misc LUT fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206035458.3894-1-alexander.deucher@amd.com
Dave Airlie [Fri, 7 Feb 2020 02:23:24 +0000 (12:23 +1000)]
Merge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next
Just a couple of fixes to Volta/Turing modesetting on some systems.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/
Dave Airlie [Fri, 7 Feb 2020 02:22:23 +0000 (12:22 +1000)]
Merge tag 'drm/tegra/for-5.6-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux into drm-next
drm/tegra: Fixes for v5.6-rc1
These are a couple of quick fixes for regressions that were found during
the first two weeks of the merge window.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206172753.2185390-1-thierry.reding@gmail.com
Steve French [Thu, 6 Feb 2020 18:32:03 +0000 (12:32 -0600)]
smb3: Add defines for new information level, FileIdInformation
See MS-FSCC 2.4.43. Valid to be quried from most
Windows servers (among others).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Steve French [Thu, 6 Feb 2020 23:31:56 +0000 (17:31 -0600)]
smb3: print warning once if posix context returned on open
SMB3.1.1 POSIX Context processing is not complete yet - so print warning
(once) if server returns it on open.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Steve French [Thu, 6 Feb 2020 22:04:59 +0000 (16:04 -0600)]
smb3: add one more dynamic tracepoint missing from strict fsync path
We didn't have a dynamic trace point for catching errors in
file_write_and_wait_range error cases in cifs_strict_fsync.
Since not all apps check for write behind errors, it can be
important for debugging to be able to trace these error
paths.
Suggested-and-reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Aurelien Aptel [Thu, 6 Feb 2020 17:16:55 +0000 (18:16 +0100)]
cifs: fix mode bits from dir listing when mounted with modefromsid
When mounting with -o modefromsid, the mode bits are stored in an
ACE. Directory enumeration (e.g. ls -l /mnt) triggers an SMB Query Dir
which does not include ACEs in its response. The mode bits in this
case are silently set to a default value of 755 instead.
This patch marks the dentry created during the directory enumeration
as needing re-evaluation (i.e. additional Query Info with ACEs) so
that the mode bits can be properly extracted.
Quick repro:
$ mount.cifs //win19.test/data /mnt -o ...,modefromsid
$ touch /mnt/foo && chmod 751 /mnt/foo
$ stat /mnt/foo
# reports 751 (OK)
$ sleep 2
# dentry older than 1s by default get invalidated
$ ls -l /mnt
# since dentry invalid, ls does a Query Dir
# and reports foo as 755 (WRONG)
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
David S. Miller [Thu, 6 Feb 2020 22:26:18 +0000 (23:26 +0100)]
Merge tag 'mlx5-fixes-2020-02-06' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2020-02-06
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.19:
('net/mlx5: IPsec, Fix esp modify function attribute')
('net/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx')
For -stable v5.4:
('net/mlx5: Deprecate usage of generic TLS HW capability bit')
('net/mlx5: Fix deadlock in fs_core')
For -stable v5.5:
('net/mlx5e: TX, Error completion is for last WQE in batch')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Mon, 27 Jan 2020 12:18:14 +0000 (14:18 +0200)]
net/mlx5: Deprecate usage of generic TLS HW capability bit
Deprecate the generic TLS cap bit, use the new TX-specific
TLS cap bit instead.
Fixes:
a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Thu, 9 Jan 2020 13:53:37 +0000 (15:53 +0200)]
net/mlx5e: TX, Error completion is for last WQE in batch
For a cyclic work queue, when not requesting a completion per WQE,
a single CQE might indicate the completion of several WQEs.
However, in case some WQE in the batch causes an error, then an error
completion is issued, breaking the batch, and pointing to the offending
WQE in the wqe_counter field.
Hence, WQE-specific error CQE handling (like printing, breaking, etc...)
should be performed only for the last WQE in batch.
Fixes:
130c7b46c93d ("net/mlx5e: TX, Dump WQs wqe descriptors on CQE with error events")
Fixes:
fd9b4be8002c ("net/mlx5e: RX, Support multiple outstanding UMR posts")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Raed Salem [Wed, 23 Oct 2019 13:41:21 +0000 (16:41 +0300)]
net/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx
SA context is allocated at mlx5_fpga_ipsec_create_sa_ctx,
however the counterpart mlx5_fpga_ipsec_delete_sa_ctx function
nullifies sa_ctx pointer without freeing the memory allocated,
hence the memory leak.
Fix by free SA context when the SA is released.
Fixes:
d6c4f0298cec ("net/mlx5: Refactor accel IPSec code")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Raed Salem [Tue, 24 Dec 2019 07:54:45 +0000 (09:54 +0200)]
net/mlx5: IPsec, Fix esp modify function attribute
The function mlx5_fpga_esp_validate_xfrm_attrs is wrongly used
with negative negation as zero value indicates success but it
used as failure return value instead.
Fix by remove the unary not negation operator.
Fixes:
05564d0ae075 ("net/mlx5: Add flow-steering commands for FPGA IPSec implementation")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Maor Gottlieb [Mon, 27 Jan 2020 07:27:51 +0000 (09:27 +0200)]
net/mlx5: Fix deadlock in fs_core
free_match_list could be called when the flow table is already
locked. We need to pass this notation to tree_put_node.
It fixes the following lockdep warnning:
[ 1797.268537] ============================================
[ 1797.276837] WARNING: possible recursive locking detected
[ 1797.285101] 5.5.0-rc5+ #10 Not tainted
[ 1797.291641] --------------------------------------------
[ 1797.299917] handler10/9296 is trying to acquire lock:
[ 1797.307885]
ffff889ad399a0a0 (&node->lock){++++}, at:
tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.319694]
[ 1797.319694] but task is already holding lock:
[ 1797.330904]
ffff889ad399a0a0 (&node->lock){++++}, at:
nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
[ 1797.344707]
[ 1797.344707] other info that might help us debug this:
[ 1797.356952] Possible unsafe locking scenario:
[ 1797.356952]
[ 1797.368333] CPU0
[ 1797.373357] ----
[ 1797.378364] lock(&node->lock);
[ 1797.384222] lock(&node->lock);
[ 1797.390031]
[ 1797.390031] *** DEADLOCK ***
[ 1797.390031]
[ 1797.403003] May be due to missing lock nesting notation
[ 1797.403003]
[ 1797.414691] 3 locks held by handler10/9296:
[ 1797.421465] #0:
ffff889cf2c5a110 (&block->cb_lock){++++}, at:
tc_setup_cb_add+0x70/0x250
[ 1797.432810] #1:
ffff88a030081490 (&comp->sem){++++}, at:
mlx5_devcom_get_peer_data+0x4c/0xb0 [mlx5_core]
[ 1797.445829] #2:
ffff889ad399a0a0 (&node->lock){++++}, at:
nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
[ 1797.459913]
[ 1797.459913] stack backtrace:
[ 1797.469436] CPU: 1 PID: 9296 Comm: handler10 Kdump: loaded Not
tainted 5.5.0-rc5+ #10
[ 1797.480643] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS
2.4.3 01/17/2017
[ 1797.491480] Call Trace:
[ 1797.496701] dump_stack+0x96/0xe0
[ 1797.502864] __lock_acquire.cold.63+0xf8/0x212
[ 1797.510301] ? lockdep_hardirqs_on+0x250/0x250
[ 1797.517701] ? mark_held_locks+0x55/0xa0
[ 1797.524547] ? quarantine_put+0xb7/0x160
[ 1797.531422] ? lockdep_hardirqs_on+0x17d/0x250
[ 1797.538913] lock_acquire+0xd6/0x1f0
[ 1797.545529] ? tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.553701] down_write+0x94/0x140
[ 1797.560206] ? tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.568464] ? down_write_killable_nested+0x170/0x170
[ 1797.576925] ? del_hw_flow_group+0xde/0x1f0 [mlx5_core]
[ 1797.585629] tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.593891] ? free_match_list.part.25+0x147/0x170 [mlx5_core]
[ 1797.603389] free_match_list.part.25+0xe0/0x170 [mlx5_core]
[ 1797.612654] _mlx5_add_flow_rules+0x17e2/0x20b0 [mlx5_core]
[ 1797.621838] ? lock_acquire+0xd6/0x1f0
[ 1797.629028] ? esw_get_prio_table+0xb0/0x3e0 [mlx5_core]
[ 1797.637981] ? alloc_insert_flow_group+0x420/0x420 [mlx5_core]
[ 1797.647459] ? try_to_wake_up+0x4c7/0xc70
[ 1797.654881] ? lock_downgrade+0x350/0x350
[ 1797.662271] ? __mutex_unlock_slowpath+0xb1/0x3f0
[ 1797.670396] ? find_held_lock+0xac/0xd0
[ 1797.677540] ? mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
[ 1797.686467] mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
[ 1797.695134] ? _mlx5_add_flow_rules+0x20b0/0x20b0 [mlx5_core]
[ 1797.704270] ? irq_exit+0xa5/0x170
[ 1797.710764] ? retint_kernel+0x10/0x10
[ 1797.717698] ? mlx5_eswitch_set_rule_source_port.isra.9+0x122/0x230
[mlx5_core]
[ 1797.728708] mlx5_eswitch_add_offloaded_rule+0x465/0x6d0 [mlx5_core]
[ 1797.738713] ? mlx5_eswitch_get_prio_range+0x30/0x30 [mlx5_core]
[ 1797.748384] ? mlx5_fc_stats_work+0x670/0x670 [mlx5_core]
[ 1797.757400] mlx5e_tc_offload_fdb_rules.isra.27+0x24/0x90 [mlx5_core]
[ 1797.767665] mlx5e_tc_add_fdb_flow+0xaf8/0xd40 [mlx5_core]
[ 1797.776886] ? mlx5e_encap_put+0xd0/0xd0 [mlx5_core]
[ 1797.785562] ? mlx5e_alloc_flow.isra.43+0x18c/0x1c0 [mlx5_core]
[ 1797.795353] __mlx5e_add_fdb_flow+0x2e2/0x440 [mlx5_core]
[ 1797.804558] ? mlx5e_tc_update_neigh_used_value+0x8c0/0x8c0
[mlx5_core]
[ 1797.815093] ? wait_for_completion+0x260/0x260
[ 1797.823272] mlx5e_configure_flower+0xe94/0x1620 [mlx5_core]
[ 1797.832792] ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
[ 1797.842096] ? down_read+0x11a/0x2e0
[ 1797.849090] ? down_write+0x140/0x140
[ 1797.856142] ? mlx5e_rep_indr_setup_block_cb+0xc0/0xc0 [mlx5_core]
[ 1797.866027] tc_setup_cb_add+0x11a/0x250
[ 1797.873339] fl_hw_replace_filter+0x25e/0x320 [cls_flower]
[ 1797.882385] ? fl_hw_destroy_filter+0x1c0/0x1c0 [cls_flower]
[ 1797.891607] fl_change+0x1d54/0x1fb6 [cls_flower]
[ 1797.899772] ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
[cls_flower]
[ 1797.910728] ? lock_downgrade+0x350/0x350
[ 1797.918187] ? __radix_tree_lookup+0xa5/0x130
[ 1797.926046] ? fl_set_key+0x1590/0x1590 [cls_flower]
[ 1797.934611] ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
[cls_flower]
[ 1797.945673] tc_new_tfilter+0xcd1/0x1240
[ 1797.953138] ? tc_del_tfilter+0xb10/0xb10
[ 1797.960688] ? avc_has_perm_noaudit+0x92/0x320
[ 1797.968721] ? avc_has_perm_noaudit+0x1df/0x320
[ 1797.976816] ? avc_has_extended_perms+0x990/0x990
[ 1797.985090] ? mark_lock+0xaa/0x9e0
[ 1797.991988] ? match_held_lock+0x1b/0x240
[ 1797.999457] ? match_held_lock+0x1b/0x240
[ 1798.006859] ? find_held_lock+0xac/0xd0
[ 1798.014045] ? symbol_put_addr+0x40/0x40
[ 1798.021317] ? rcu_read_lock_sched_held+0xd0/0xd0
[ 1798.029460] ? tc_del_tfilter+0xb10/0xb10
[ 1798.036810] rtnetlink_rcv_msg+0x4d5/0x620
[ 1798.044236] ? rtnl_bridge_getlink+0x460/0x460
[ 1798.052034] ? lockdep_hardirqs_on+0x250/0x250
[ 1798.059837] ? match_held_lock+0x1b/0x240
[ 1798.067146] ? find_held_lock+0xac/0xd0
[ 1798.074246] netlink_rcv_skb+0xc6/0x1f0
[ 1798.081339] ? rtnl_bridge_getlink+0x460/0x460
[ 1798.089104] ? netlink_ack+0x440/0x440
[ 1798.096061] netlink_unicast+0x2d4/0x3b0
[ 1798.103189] ? netlink_attachskb+0x3f0/0x3f0
[ 1798.110724] ? _copy_from_iter_full+0xda/0x370
[ 1798.118415] netlink_sendmsg+0x3ba/0x6a0
[ 1798.125478] ? netlink_unicast+0x3b0/0x3b0
[ 1798.132705] ? netlink_unicast+0x3b0/0x3b0
[ 1798.139880] sock_sendmsg+0x94/0xa0
[ 1798.146332] ____sys_sendmsg+0x36c/0x3f0
[ 1798.153251] ? copy_msghdr_from_user+0x165/0x230
[ 1798.160941] ? kernel_sendmsg+0x30/0x30
[ 1798.167738] ___sys_sendmsg+0xeb/0x150
[ 1798.174411] ? sendmsg_copy_msghdr+0x30/0x30
[ 1798.181649] ? lock_downgrade+0x350/0x350
[ 1798.188559] ? rcu_read_lock_sched_held+0xd0/0xd0
[ 1798.196239] ? __fget+0x21d/0x320
[ 1798.202335] ? do_dup2+0x2a0/0x2a0
[ 1798.208499] ? lock_downgrade+0x350/0x350
[ 1798.215366] ? __fget_light+0xd6/0xf0
[ 1798.221808] ? syscall_trace_enter+0x369/0x5d0
[ 1798.229112] __sys_sendmsg+0xd3/0x160
[ 1798.235511] ? __sys_sendmsg_sock+0x60/0x60
[ 1798.242478] ? syscall_trace_enter+0x233/0x5d0
[ 1798.249721] ? syscall_slow_exit_work+0x280/0x280
[ 1798.257211] ? do_syscall_64+0x1e/0x2e0
[ 1798.263680] do_syscall_64+0x72/0x2e0
[ 1798.269950] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes:
bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aurelien Aptel [Thu, 6 Feb 2020 12:49:26 +0000 (13:49 +0100)]
cifs: fix channel signing
The server var was accidentally used as an iterator over the global
list of connections, thus overwritten the passed argument. This
resulted in the wrong signing key being returned for extra channels.
Fix this by using a separate var to iterate.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Andreas Gruenbacher [Tue, 14 Jan 2020 16:12:18 +0000 (17:12 +0100)]
gfs2: fix O_SYNC write handling
In gfs2_file_write_iter, for direct writes, the error checking in the buffered
write fallback case is incomplete. This can cause inode write errors to go
undetected. Fix and clean up gfs2_file_write_iter along the way.
Based on a proposed fix by Christoph Hellwig <hch@lst.de>.
Fixes:
967bcc91b044 ("gfs2: iomap direct I/O support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Thierry Reding [Tue, 4 Feb 2020 13:59:26 +0000 (14:59 +0100)]
gpu: host1x: Set DMA direction only for DMA-mapped buffer objects
The DMA direction is only used by the DMA API, so there is no use in
setting it when a buffer object isn't mapped with the DMA API.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Thierry Reding [Tue, 4 Feb 2020 13:59:25 +0000 (14:59 +0100)]
drm/tegra: Reuse IOVA mapping where possible
This partially reverts the DMA API support that was recently merged
because it was causing performance regressions on older Tegra devices.
Unfortunately, the cache maintenance performed by dma_map_sg() and
dma_unmap_sg() causes performance to drop by a factor of 10.
The right solution for this would be to cache mappings for buffers per
consumer device, but that's a bit involved. Instead, we simply revert to
the old behaviour of sharing IOVA mappings when we know that devices can
do so (i.e. they share the same IOMMU domain).
Cc: <stable@vger.kernel.org> # v5.5
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Thierry Reding [Tue, 4 Feb 2020 13:59:24 +0000 (14:59 +0100)]
drm/tegra: Relax IOMMU usage criteria on old Tegra
Older Tegra devices only allow addressing 32 bits of memory, so whether
or not the host1x is attached to an IOMMU doesn't matter. host1x IOMMU
attachment is only needed on devices that can address memory beyond the
32-bit boundary and where the host1x doesn't support the wide GATHER
opcode that allows it to access buffers at higher addresses.
Cc: <stable@vger.kernel.org> # v5.5
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Linus Torvalds [Thu, 6 Feb 2020 17:07:45 +0000 (09:07 -0800)]
Merge tag 'kvm-5.6-2' of git://git./virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"s390:
- fix register corruption
- ENOTSUPP/EOPNOTSUPP mixed
- reset cleanups/fixes
- selftests
x86:
- Bug fixes and cleanups
- AMD support for APIC virtualization even in combination with
in-kernel PIT or IOAPIC.
MIPS:
- Compilation fix.
Generic:
- Fix refcount overflow for zero page"
* tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit
x86: vmxfeatures: rename features for consistency with KVM and manual
KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
KVM: x86: Fix perfctr WRMSR for running counters
x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
kvm: mmu: Separate generating and setting mmio ptes
kvm: mmu: Replace unsigned with unsigned int for PTE access
KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()
KVM: MIPS: Fix a build error due to referencing not-yet-defined function
x86/kvm: do not setup pv tlb flush when not paravirtualized
KVM: fix overflow of zero page refcount with ksm running
KVM: x86: Take a u64 when checking for a valid dr7 value
KVM: x86: use raw clock values consistently
KVM: x86: reorganize pvclock_gtod_data members
KVM: nVMX: delete meaningless nested_vmx_run() declaration
KVM: SVM: allow AVIC without split irqchip
kvm: ioapic: Lazy update IOAPIC EOI
...
Linus Torvalds [Thu, 6 Feb 2020 17:05:42 +0000 (09:05 -0800)]
Merge tag 'kgdb-fixes-5.6-rc1' of git://git./linux/kernel/git/danielt/linux
Pull kgdb fix from Daniel Thompson:
"One of the simplifications added for 5.6-rc1 has caused build
regressions on some platforms (it was reported for sparc64).
This fixes it with a revert"
* tag 'kgdb-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
Revert "kdb: Get rid of confusing diag msg from "rd" if current task has no regs"
Christoph Hellwig [Wed, 15 Jan 2020 15:38:29 +0000 (16:38 +0100)]
gfs2: move setting current->backing_dev_info
Set current->backing_dev_info just around the buffered write calls to
prepare for the next fix.
Fixes:
967bcc91b044 ("gfs2: iomap direct I/O support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Abhi Das [Tue, 4 Feb 2020 20:14:56 +0000 (14:14 -0600)]
gfs2: fix gfs2_find_jhead that returns uninitialized jhead with seq 0
When the first log header in a journal happens to have a sequence
number of 0, a bug in gfs2_find_jhead() causes it to prematurely exit,
and return an uninitialized jhead with seq 0. This can cause failures
in the caller. For instance, a mount fails in one test case.
The correct behavior is for it to continue searching through the journal
to find the correct journal head with the highest sequence number.
Fixes:
f4686c26ecc3 ("gfs2: read journal in large chunks")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Dan Carpenter [Mon, 13 Jan 2020 13:23:07 +0000 (16:23 +0300)]
nfsd4: fix double free in nfsd4_do_async_copy()
This frees "copy->nf_src" before and again after the goto.
Fixes:
ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Trond Myklebust [Tue, 14 Jan 2020 17:02:44 +0000 (12:02 -0500)]
nfsd: convert file cache to use over/underflow safe refcount
Use the 'refcount_t' type instead of 'atomic_t' for improved
refcounting safety.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Trond Myklebust [Tue, 14 Jan 2020 17:00:22 +0000 (12:00 -0500)]
nfsd: Define the file access mode enum for tracing
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Trond Myklebust [Tue, 14 Jan 2020 17:00:21 +0000 (12:00 -0500)]
nfsd: Fix a perf warning
perf does not know how to deal with a __builtin_bswap32() call, and
complains. All other functions just store the xid etc in host endian
form, so let's do that in the tracepoint for nfsd_file_acquire too.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
zhengbin [Tue, 14 Jan 2020 12:39:45 +0000 (20:39 +0800)]
fuse: use true,false for bool variable
Fixes coccicheck warning:
fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Daniel W. S. Almeida [Wed, 29 Jan 2020 05:06:21 +0000 (02:06 -0300)]
Documentation: filesystems: convert fuse to RST
Converts fuse.txt to reStructuredText format, improving the presentation
without changing much of the underlying content.
Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Wed, 5 Feb 2020 13:15:46 +0000 (08:15 -0500)]
fuse: Support RENAME_WHITEOUT flag
Allow fuse to pass RENAME_WHITEOUT to fuse server. Overlayfs on top of
virtiofs uses RENAME_WHITEOUT.
Without this patch renaming a directory in overlayfs (dir is on lower)
fails with -EINVAL. With this patch it works.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 6 Feb 2020 15:39:28 +0000 (16:39 +0100)]
fuse: don't overflow LLONG_MAX with end offset
Handle the special case of fuse_readpages() wanting to read the last page
of a hugest file possible and overflowing the end offset in the process.
This is basically to unbreak xfstests:generic/525 and prevent filesystems
from doing bad things with an overflowing offset.
Reported-by: Xiao Yang <ice_yangxiao@163.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 6 Feb 2020 15:39:28 +0000 (16:39 +0100)]
fix up iter on short count in fuse_direct_io()
fuse_direct_io() can end up advancing the iterator by more than the amount
of data read or written. This case is handled by the generic code if going
through ->direct_IO(), but not in the FOPEN_DIRECT_IO case.
Fix by reverting the extra bytes from the iterator in case of error or a
short count.
To test: install lxcfs, then the following testcase
int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY);
sendfile(1, fd, NULL,
16777216);
sendfile(1, fd, NULL,
16777216);
will spew WARN_ON() in iov_iter_pipe().
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes:
3c3db095b68c ("fuse: use iov_iter based generic splice helpers")
Cc: <stable@vger.kernel.org> # v5.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Steve French [Thu, 6 Feb 2020 12:00:14 +0000 (06:00 -0600)]
cifs: add SMB3 change notification support
A commonly used SMB3 feature is change notification, allowing an
app to be notified about changes to a directory. The SMB3
Notify request blocks until the server detects a change to that
directory or its contents that matches the completion flags
that were passed in and the "watch_tree" flag (which indicates
whether subdirectories under this directory should be also
included). See MS-SMB2 2.2.35 for additional detail.
To use this simply pass in the following structure to ioctl:
struct __attribute__((__packed__)) smb3_notify {
uint32_t completion_filter;
bool watch_tree;
} __packed;
using CIFS_IOC_NOTIFY 0x4005cf09
or equivalently _IOW(CIFS_IOCTL_MAGIC, 9, struct smb3_notify)
SMB3 change notification is supported by all major servers.
The ioctl will block until the server detects a change to that
directory or its subdirectories (if watch_tree is set).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Aurelien Aptel [Thu, 6 Feb 2020 09:19:11 +0000 (10:19 +0100)]
cifs: make multichannel warning more visible
When no interfaces are returned by the server we cannot open multiple
channels. Make it more obvious by reporting that to the user at the
VFS log level.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Thu, 6 Feb 2020 03:55:19 +0000 (13:55 +1000)]
cifs: fix soft mounts hanging in the reconnect code
RHBZ: 1795423
This is the SMB1 version of a patch we already have for SMB2
In recent DFS updates we have a new variable controlling how many times we will
retry to reconnect the share.
If DFS is not used, then this variable is initialized to 0 in:
static inline int
dfs_cache_get_nr_tgts(const struct dfs_cache_tgt_list *tl)
{
return tl ? tl->tl_numtgts : 0;
}
This means that in the reconnect loop in smb2_reconnect() we will immediately wrap retries to -1
and never actually get to pass this conditional:
if (--retries)
continue;
The effect is that we no longer reach the point where we fail the commands with -EHOSTDOWN
and basically the kernel threads are virtually hung and unkillable.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Linus Torvalds [Thu, 6 Feb 2020 14:17:38 +0000 (14:17 +0000)]
Merge tag 'pci-v5.6-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Define to_pci_sysdata() always to fix build breakage when !CONFIG_PCI
(Jason A. Donenfeld)
- Use PF PASID for VFs to fix VF IOMMU bind failures (Kuppuswamy
Sathyanarayanan)
* tag 'pci-v5.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/ATS: Use PF PASID for VFs
x86/PCI: Define to_pci_sysdata() even when !CONFIG_PCI
Linus Torvalds [Thu, 6 Feb 2020 14:15:01 +0000 (14:15 +0000)]
Merge tag 'sound-fix-5.6-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of pending small fixes:
ALSA core:
- PCM memory leak fix
ASoC:
- Lots of SOF and Intel driver fixes
- Addition of COMMON_CLK for wcd934x
- Regression fixes for AMD and Tegra platforms
HD-audio:
- DP-MST HDMI regression fix, Tegra workarounds, HP quirk fix
Others:
- A few fixes relevant with the recent uapi-updates
- Sparse warnings and endianness fixes"
* tag 'sound-fix-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (35 commits)
ALSA: hda: Clear RIRB status before reading WP
ALSA: hda/realtek - Fixed one of HP ALC671 platform Headset Mic supported
ASoC: wcd934x: Add missing COMMON_CLK dependency to SND_SOC_ALL_CODECS
ALSA: hda - Fix DP-MST support for NVIDIA codecs
ASoC: wcd934x: Add missing COMMON_CLK dependency
MAINTAINERS: Remove the Bard Liao from the MAINTAINERS of Realtek CODECs
ASoC: tegra: Revert 24 and 32 bit support
ASoC: SOF: Intel: add PCI ID for JasperLake
ALSA: hdsp: Make the firmware loading ioctl a bit more readable
ALSA: emu10k1: Fix annotation and cast for the recent uapi header change
ALSA: dummy: Fix PCM format loop in proc output
ALSA: usb-audio: Annotate endianess in Scarlett gen2 quirk
ALSA: usb-audio: Fix endianess in descriptor validation
ALSA: hda: Add JasperLake PCI ID and codec vid
ALSA: pcm: Fix sparse warnings wrt snd_pcm_state_t
ALSA: pcm: Fix memory leak at closing a stream without hw_free
ALSA: uapi: Fix sparse warning
ASoC: rt715: Add __maybe_unused to PM callbacks
ASoC: rt711: Add __maybe_unused to PM callbacks
ASoC: rt700: Add __maybe_unused to PM callbacks
...
Florian Fainelli [Wed, 5 Feb 2020 20:32:04 +0000 (12:32 -0800)]
net: systemport: Avoid RBUF stuck in Wake-on-LAN mode
After a number of suspend and resume cycles, it is possible for the RBUF
to be stuck in Wake-on-LAN mode, despite the MPD enable bit being
cleared which instructed the RBUF to exit that mode.
Avoid creating that problematic condition by clearing the RX_EN and
TX_EN bits in the UniMAC prior to disable the Magic Packet Detector
logic which is guaranteed to make the RBUF exit Wake-on-LAN mode.
Fixes:
83e82f4c706b ("net: systemport: add Wake-on-LAN support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 5 Feb 2020 20:22:46 +0000 (21:22 +0100)]
r8169: fix performance regression related to PCIe max read request size
It turned out that on low performance systems the original change can
cause lower tx performance. On a N3450-based mini-PC tx performance
in iperf3 was reduced from 950Mbps to ~900Mbps. Therefore effectively
revert the original change, just use pcie_set_readrq() now instead of
changing the PCIe capability register directly.
Fixes:
2df49d365498 ("r8169: remove fiddling with the PCIe max read request size")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 5 Feb 2020 11:53:30 +0000 (14:53 +0300)]
net: sched: prevent a use after free
The bug is that we call kfree_skb(skb) and then pass "skb" to
qdisc_pkt_len(skb) on the next line, which is a use after free.
Also Cong Wang points out that it's better to delay the actual
frees until we drop the rtnl lock so we should use rtnl_kfree_skbs()
instead of kfree_skb().
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Fixes:
ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qian Cai [Tue, 4 Feb 2020 18:40:29 +0000 (13:40 -0500)]
skbuff: fix a data race in skb_queue_len()
sk_buff.qlen can be accessed concurrently as noticed by KCSAN,
BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg
read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
net/unix/af_unix.c:1761
____sys_sendmsg+0x33e/0x370
___sys_sendmsg+0xa6/0xf0
__sys_sendmsg+0x69/0xf0
__x64_sys_sendmsg+0x51/0x70
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe
write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
__skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
__skb_try_recv_datagram+0xbe/0x220
unix_dgram_recvmsg+0xee/0x850
____sys_recvmsg+0x1fb/0x210
___sys_recvmsg+0xa2/0xf0
__sys_recvmsg+0x66/0xf0
__x64_sys_recvmsg+0x51/0x70
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Since only the read is operating as lockless, it could introduce a logic
bug in unix_recvq_full() due to the load tearing. Fix it by adding
a lockless variant of skb_queue_len() and unix_recvq_full() where
READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
the commit
d7d16a89350a ("net: add skb_queue_empty_lockless()").
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 6 Feb 2020 12:21:01 +0000 (12:21 +0000)]
Merge tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
- a set of patches that fixes various corner cases in mount and umount
code (Xiubo Li). This has to do with choosing an MDS, distinguishing
between laggy and down MDSes and parsing the server path.
- inode initialization fixes (Jeff Layton). The one included here
mostly concerns things like open_by_handle() and there is another one
that will come through Al.
- copy_file_range() now uses the new copy-from2 op (Luis Henriques).
The existing copy-from op turned out to be infeasible for generic
filesystem use; we disable the copy offload if OSDs don't support
copy-from2.
- a patch to link "rbd" and "block" devices together in sysfs (Hannes
Reinecke)
... and a smattering of cleanups from Xiubo, Jeff and Chengguang.
* tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client: (25 commits)
rbd: set the 'device' link in sysfs
ceph: move net/ceph/ceph_fs.c to fs/ceph/util.c
ceph: print name of xattr in __ceph_{get,set}xattr() douts
ceph: print r_direct_hash in hex in __choose_mds() dout
ceph: use copy-from2 op in copy_file_range
ceph: close holes in structs ceph_mds_session and ceph_mds_request
rbd: work around -Wuninitialized warning
ceph: allocate the correct amount of extra bytes for the session features
ceph: rename get_session and switch to use ceph_get_mds_session
ceph: remove the extra slashes in the server path
ceph: add possible_max_rank and make the code more readable
ceph: print dentry offset in hex and fix xattr_version type
ceph: only touch the caps which have the subset mask requested
ceph: don't clear I_NEW until inode metadata is fully populated
ceph: retry the same mds later after the new session is opened
ceph: check availability of mds cluster on mount after wait timeout
ceph: keep the session state until it is released
ceph: add __send_request helper
ceph: ensure we have a new cap before continuing in fill_inode
ceph: drop unused ttl_from parameter from fill_inode
...
Daniel Thompson [Thu, 6 Feb 2020 11:40:09 +0000 (11:40 +0000)]
Revert "kdb: Get rid of confusing diag msg from "rd" if current task has no regs"
This reverts commit
bbfceba15f8d1260c328a254efc2b3f2deae4904.
When DBG_MAX_REG_NUM is zero then a number of symbols are conditionally
defined. It is therefore not possible to check it using C expressions.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Acked-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Takashi Iwai [Thu, 6 Feb 2020 10:58:44 +0000 (11:58 +0100)]
Merge tag 'asoc-v5.6-3' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fix for v5.6
An incremental fix for the Qualcomm COMMON_CLK issue.
Mohan Kumar [Thu, 6 Feb 2020 10:10:53 +0000 (15:40 +0530)]
ALSA: hda: Clear RIRB status before reading WP
RIRB interrupt status getting cleared after the write pointer is read
causes a race condition, where last response(s) into RIRB may remain
unserviced by IRQ, eventually causing azx_rirb_get_response to fall
back to polling mode. Clearing the RIRB interrupt status ahead of
write pointer access ensures that this condition is avoided.
Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
Signed-off-by: Viswanath L <viswanathl@nvidia.com>
Link: https://lore.kernel.org/r/1580983853-351-1-git-send-email-viswanathl@nvidia.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Lorenzo Bianconi [Thu, 6 Feb 2020 09:14:39 +0000 (10:14 +0100)]
net: mvneta: move rx_dropped and rx_errors in per-cpu stats
Move rx_dropped and rx_errors counters in mvneta_pcpu_stats in order to
avoid possible races updating statistics
Fixes:
562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
Fixes:
dc35a10f68d3 ("net: mvneta: bm: add support for hardware buffer management")
Fixes:
c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devulapally Shiva Krishna [Thu, 6 Feb 2020 08:34:43 +0000 (14:04 +0530)]
cxgb4: Added tls stats prints.
Added debugfs entry to show the tls stats.
Signed-off-by: Devulapally Shiva Krishna <shiva@chelsio.com>
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 5 Feb 2020 23:39:37 +0000 (00:39 +0100)]
mptcp: fix use-after-free for ipv6
Turns out that when we accept a new subflow, the newly created
inet_sk(tcp_sk)->pinet6 points at the ipv6_pinfo structure of the
listener socket.
This wasn't caught by the selftest because it closes the accepted fd
before the listening one.
adding a close(listenfd) after accept returns is enough:
BUG: KASAN: use-after-free in inet6_getname+0x6ba/0x790
Read of size 1 at addr
ffff88810e310866 by task mptcp_connect/2518
Call Trace:
inet6_getname+0x6ba/0x790
__sys_getpeername+0x10b/0x250
__x64_sys_getpeername+0x6f/0xb0
also alter test program to exercise this.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Hildenbrand [Wed, 5 Feb 2020 16:34:01 +0000 (17:34 +0100)]
virtio_balloon: Fix memory leaks on errors in virtballoon_probe()
We forget to put the inode and unmount the kernfs used for compaction.
Fixes:
71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Liang Li <liang.z.li@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200205163402.42627-3-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 5 Feb 2020 16:34:00 +0000 (17:34 +0100)]
virtio-balloon: Fix memory leak when unloading while hinting is in progress
When unloading the driver while hinting is in progress, we will not
release the free page blocks back to MM, resulting in a memory leak.
Fixes:
86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Liang Li <liang.z.li@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200205163402.42627-2-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Thu, 6 Feb 2020 07:40:58 +0000 (02:40 -0500)]
virtio_balloon: prevent pfn array overflow
Make sure, at build time, that pfn array is big enough to hold a single
page. It happens to be true since the PAGE_SHIFT value at the moment is
20, which is 1M - exactly 256 4K balloon pages.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Christoph Hellwig [Thu, 12 Dec 2019 16:37:19 +0000 (17:37 +0100)]
virtio-blk: remove VIRTIO_BLK_F_SCSI support
Since the need for a special flag to support SCSI passthrough on a
block device was added in May 2017 the SCSI passthrough support in
virtio-blk has been disabled. It has always been a bad idea
(just ask the original author..) and we have virtio-scsi for proper
passthrough. The feature also never made it into the virtio 1.0
or later specifications.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Daniel Verkamp [Fri, 3 Jan 2020 18:40:45 +0000 (10:40 -0800)]
virtio-pci: check name when counting MSI-X vectors
VQs without a name specified are not valid; they are skipped in the
later loop that assigns MSI-X vectors to queues, but the per_vq_vectors
loop above that counts the required number of vectors previously still
counted any queue with a non-NULL callback as needing a vector.
Add a check to the per_vq_vectors loop so that vectors with no name are
not counted to make the two loops consistent. This prevents
over-counting unnecessary vectors (e.g. for features which were not
negotiated with the device).
Cc: stable@vger.kernel.org
Fixes:
86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Wang, Wei W <wei.w.wang@intel.com>
Daniel Verkamp [Fri, 3 Jan 2020 18:40:43 +0000 (10:40 -0800)]
virtio-balloon: initialize all vq callbacks
Ensure that elements of the callbacks array that correspond to
unavailable features are set to NULL; previously, they would be left
uninitialized.
Since the corresponding names array elements were explicitly set to
NULL, the uninitialized callback pointers would not actually be
dereferenced; however, the uninitialized callbacks elements would still
be read in vp_find_vqs_msix() and used to calculate the number of MSI-X
vectors required.
Cc: stable@vger.kernel.org
Fixes:
86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Yangtao Li [Sun, 22 Dec 2019 19:08:39 +0000 (19:08 +0000)]
virtio-mmio: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code, which
contains platform_get_resource, devm_request_mem_region and
devm_ioremap.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Thu, 6 Feb 2020 08:13:23 +0000 (08:13 +0000)]
Merge branch 'for-next' of git://git./linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
"A couple of changes:
- remove old CONFIG options from the m68knommu defconfig files
- fix a warning in the m68k non-MMU get_user() macro"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: fix memcpy() out of bounds warning in get_user()
m68k: configs: Cleanup old Kconfig IO scheduler options
Linus Torvalds [Thu, 6 Feb 2020 08:08:59 +0000 (08:08 +0000)]
Merge tag 'Smack-for-5.6' of git://github.com/cschaufler/smack-next
Pull smack fix from Casey Schaufler:
"One fix for an obscure error found using an old version of ping(1)
that did not use IPv6 sockets in the documented way"
* tag 'Smack-for-5.6' of git://github.com/cschaufler/smack-next:
broken ping to ipv6 linklocal addresses on debian buster
Linus Torvalds [Thu, 6 Feb 2020 07:58:38 +0000 (07:58 +0000)]
Merge tag 'xfs-5.6-merge-8' of git://git./fs/xfs/xfs-linux
Pull moar xfs updates from Darrick Wong:
"This contains the buffer error code refactoring I mentioned last week,
now that it has had extra time to complete the full xfs fuzz testing
suite to make sure there aren't any obvious new bugs"
* tag 'xfs-5.6-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix xfs_buf_ioerror_alert location reporting
xfs: remove unnecessary null pointer checks from _read_agf callers
xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers
xfs: remove the xfs_btree_get_buf[ls] functions
xfs: make xfs_trans_get_buf return an error code
xfs: make xfs_trans_get_buf_map return an error code
xfs: make xfs_buf_read return an error code
xfs: make xfs_buf_get_uncached return an error code
xfs: make xfs_buf_get return an error code
xfs: make xfs_buf_read_map return an error code
xfs: make xfs_buf_get_map return an error code
xfs: make xfs_buf_alloc return an error code
Linus Torvalds [Thu, 6 Feb 2020 07:12:11 +0000 (07:12 +0000)]
Merge tag 'trace-v5.6-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Added new "bootconfig".
This looks for a file appended to initrd to add boot config options,
and has been discussed thoroughly at Linux Plumbers.
Very useful for adding kprobes at bootup.
Only enabled if "bootconfig" is on the real kernel command line.
- Created dynamic event creation.
Merges common code between creating synthetic events and kprobe
events.
- Rename perf "ring_buffer" structure to "perf_buffer"
- Rename ftrace "ring_buffer" structure to "trace_buffer"
Had to rename existing "trace_buffer" to "array_buffer"
- Allow trace_printk() to work withing (some) tracing code.
- Sort of tracing configs to be a little better organized
- Fixed bug where ftrace_graph hash was not being protected properly
- Various other small fixes and clean ups
* tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
bootconfig: Show the number of nodes on boot message
tools/bootconfig: Show the number of bootconfig nodes
bootconfig: Add more parse error messages
bootconfig: Use bootconfig instead of boot config
ftrace: Protect ftrace_graph_hash with ftrace_sync
ftrace: Add comment to why rcu_dereference_sched() is open coded
tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
tracing: Annotate ftrace_graph_hash pointer with __rcu
bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
tracing: Use seq_buf for building dynevent_cmd string
tracing: Remove useless code in dynevent_arg_pair_add()
tracing: Remove check_arg() callbacks from dynevent args
tracing: Consolidate some synth_event_trace code
tracing: Fix now invalid var_ref_vals assumption in trace action
tracing: Change trace_boot to use synth_event interface
tracing: Move tracing selftests to bottom of menu
tracing: Move mmio tracer config up with the other tracers
tracing: Move tracing test module configs together
tracing: Move all function tracing configs together
tracing: Documentation for in-kernel synthetic event API
...
Kailang Yang [Wed, 5 Feb 2020 07:40:01 +0000 (15:40 +0800)]
ALSA: hda/realtek - Fixed one of HP ALC671 platform Headset Mic supported
HP want to keep BIOS verb table for release platform.
So, it need to add 0x19 pin for quirk.
Fixes:
5af29028fd6d ("ALSA: hda/realtek - Add Headset Mic supported for HP cPC")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Link: https://lore.kernel.org/r/74636ccb700a4cbda24c58a99dc430ce@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Thu, 6 Feb 2020 06:33:17 +0000 (06:33 +0000)]
Merge tag 'io_uring-5.6-2020-02-05' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"Some later fixes for io_uring:
- Small cleanup series from Pavel
- Belt and suspenders build time check of sqe size and layout
(Stefan)
- Addition of ->show_fdinfo() on request of Jann Horn, to aid in
understanding mapped personalities
- eventfd recursion/deadlock fix, for both io_uring and aio
- Fixup for send/recv handling
- Fixup for double deferral of read/write request
- Fix for potential double completion event for close request
- Adjust fadvise advice async/inline behavior
- Fix for shutdown hang with SQPOLL thread
- Fix for potential use-after-free of fixed file table"
* tag 'io_uring-5.6-2020-02-05' of git://git.kernel.dk/linux-block:
io_uring: cleanup fixed file data table references
io_uring: spin for sq thread to idle on shutdown
aio: prevent potential eventfd recursion on poll
io_uring: put the flag changing code in the same spot
io_uring: iterate req cache backwards
io_uring: punt even fadvise() WILLNEED to async context
io_uring: fix sporadic double CQE entry for close
io_uring: remove extra ->file check
io_uring: don't map read/write iovec potentially twice
io_uring: use the proper helpers for io_send/recv
io_uring: prevent potential eventfd recursion on poll
eventfd: track eventfd_signal() recursion depth
io_uring: add BUILD_BUG_ON() to assert the layout of struct io_uring_sqe
io_uring: add ->show_fdinfo() for the io_uring file descriptor
Linus Torvalds [Thu, 6 Feb 2020 06:15:23 +0000 (06:15 +0000)]
Merge tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"Some later arrivals, but all fixes at this point:
- bcache fix series (Coly)
- Series of BFQ fixes (Paolo)
- NVMe pull request from Keith with a few minor NVMe fixes
- Various little tweaks"
* tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block: (23 commits)
nvmet: update AEN list and array at one place
nvmet: Fix controller use after free
nvmet: Fix error print message at nvmet_install_queue function
brd: check and limit max_part par
nvme-pci: remove nvmeq->tags
nvmet: fix dsm failure when payload does not match sgl descriptor
nvmet: Pass lockdep expression to RCU lists
block, bfq: clarify the goal of bfq_split_bfqq()
block, bfq: get a ref to a group when adding it to a service tree
block, bfq: remove ifdefs from around gets/puts of bfq groups
block, bfq: extend incomplete name of field on_st
block, bfq: get extra ref to prevent a queue from being freed during a group move
block, bfq: do not insert oom queue into position tree
block, bfq: do not plug I/O for bfq_queues with no proc refs
bcache: check return value of prio_read()
bcache: fix incorrect data type usage in btree_flush_write()
bcache: add readahead cache policy options via sysfs interface
bcache: explicity type cast in bset_bkey_last()
bcache: fix memory corruption in bch_cache_accounting_clear()
xen/blkfront: limit allocated memory size to actual use case
...
Linus Torvalds [Thu, 6 Feb 2020 06:11:50 +0000 (06:11 +0000)]
Merge tag 'libata-5.6-2020-02-05' of git://git.kernel.dk/linux-block
Pull libata updates from Jens Axboe:
- Add a Sandisk CF card to supported pata_pcmcia list (Christian)
- Move pata_arasan_cf away from legacy API (Peter)
- Ensure ahci DMA/ints are shut down on shutdown (Prabhakar)
* tag 'libata-5.6-2020-02-05' of git://git.kernel.dk/linux-block:
ata: pata_arasan_cf: Use dma_request_chan() instead dma_request_slave_channel()
ata: ahci: Add shutdown to freeze hardware resources of ahci
pata_pcmia: add SanDisk High (>8G) CF card to supported list
Steve French [Thu, 6 Feb 2020 00:22:37 +0000 (18:22 -0600)]
cifs: Add tracepoints for errors on flush or fsync
Makes it easier to debug errors on writeback that happen later,
and are being returned on flush or fsync
For example:
writetest-17829 [002] .... 13583.407859: cifs_flush_err: ino=90 rc=-28
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Wed, 5 Feb 2020 22:52:11 +0000 (16:52 -0600)]
cifs: log warning message (once) if out of disk space
We ran into a confusing problem where an application wasn't checking
return code on close and so user didn't realize that the application
ran out of disk space. log a warning message (once) in these
cases. For example:
[ 8407.391909] Out of space writing to \\oleg-server\small-share
Signed-off-by: Steve French <stfrench@microsoft.com>
Reported-by: Oleg Kravtsov <oleg@tuxera.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Geert Uytterhoeven [Wed, 5 Feb 2020 19:46:49 +0000 (20:46 +0100)]
of: clk: Make <linux/of_clk.h> self-contained
Depending on include order:
include/linux/of_clk.h:11:45: warning: ‘struct device_node’ declared inside parameter list will not be visible outside of this definition or declaration
unsigned int of_clk_get_parent_count(struct device_node *np);
^~~~~~~~~~~
include/linux/of_clk.h:12:43: warning: ‘struct device_node’ declared inside parameter list will not be visible outside of this definition or declaration
const char *of_clk_get_parent_name(struct device_node *np, int index);
^~~~~~~~~~~
include/linux/of_clk.h:13:31: warning: ‘struct of_device_id’ declared inside parameter list will not be visible outside of this definition or declaration
void of_clk_init(const struct of_device_id *matches);
^~~~~~~~~~~~
Fix this by adding forward declarations for struct device_node and
struct of_device_id.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lkml.kernel.org/r/20200205194649.31309-1-geert+renesas@glider.be
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Masami Hiramatsu [Wed, 5 Feb 2020 13:50:23 +0000 (22:50 +0900)]
bootconfig: Show the number of nodes on boot message
Show the number of bootconfig nodes on boot message.
Link: http://lkml.kernel.org/r/158091062297.27924.9051634676068550285.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Wed, 5 Feb 2020 13:50:13 +0000 (22:50 +0900)]
tools/bootconfig: Show the number of bootconfig nodes
Show the number of bootconfig nodes when applying new bootconfig to
initrd.
Since there are limitations of bootconfig not only in its filesize,
but also the number of nodes, the number should be shown when applying
so that user can get the feeling of scale of current bootconfig.
Link: http://lkml.kernel.org/r/158091061337.27924.10886706631693823982.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Wed, 5 Feb 2020 13:50:04 +0000 (22:50 +0900)]
bootconfig: Add more parse error messages
Add more error messages for following cases.
- Exceeding max number of nodes
- Config tree data is empty (e.g. comment only)
- Config data is empty or exceeding max size
- bootconfig is already initialized
Link: http://lkml.kernel.org/r/158091060401.27924.9024818742827122764.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Wed, 5 Feb 2020 13:49:54 +0000 (22:49 +0900)]
bootconfig: Use bootconfig instead of boot config
Use "bootconfig" (1 word) instead of "boot config" (2 words)
in the boot message.
Link: http://lkml.kernel.org/r/158091059459.27924.14414336187441539879.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Wed, 5 Feb 2020 14:20:32 +0000 (09:20 -0500)]
ftrace: Protect ftrace_graph_hash with ftrace_sync
As function_graph tracer can run when RCU is not "watching", it can not be
protected by synchronize_rcu() it requires running a task on each CPU before
it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.
Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72
Cc: stable@vger.kernel.org
Fixes:
b9b0c831bed26 ("ftrace: Convert graph filter to use hash tables")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Casey Schaufler [Mon, 3 Feb 2020 17:15:00 +0000 (09:15 -0800)]
broken ping to ipv6 linklocal addresses on debian buster
I am seeing ping failures to IPv6 linklocal addresses with Debian
buster. Easiest example to reproduce is:
$ ping -c1 -w1 ff02::1%eth1
connect: Invalid argument
$ ping -c1 -w1 ff02::1%eth1
PING ff02::01%eth1(ff02::1%eth1) 56 data bytes
64 bytes from fe80::e0:f9ff:fe0c:37%eth1: icmp_seq=1 ttl=64 time=0.059 ms
git bisect traced the failure to
commit
b9ef5513c99b ("smack: Check address length before reading address family")
Arguably ping is being stupid since the buster version is not setting
the address family properly (ping on stretch for example does):
$ strace -e connect ping6 -c1 -w1 ff02::1%eth1
connect(5, {sa_family=AF_UNSPEC,
sa_data="\4\1\0\0\0\0\377\2\0\0\0\0\0\0\0\0\0\0\0\0\0\1\3\0\0\0"}, 28)
= -1 EINVAL (Invalid argument)
but the command works fine on kernels prior to this commit, so this is
breakage which goes against the Linux paradigm of "don't break userspace"
Cc: stable@vger.kernel.org
Reported-by: David Ahern <dsahern@gmail.com>
Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
security/smack/smack_lsm.c | 41 +++++++++++++++++++----------------------
1 file changed, 19 insertions(+), 22 deletions(-)
Steven Rostedt (VMware) [Wed, 5 Feb 2020 07:17:57 +0000 (02:17 -0500)]
ftrace: Add comment to why rcu_dereference_sched() is open coded
Because the function graph tracer can execute in sections where RCU is not
"watching", the rcu_dereference_sched() for the has needs to be open coded.
This is fine because the RCU "flavor" of the ftrace hash is protected by
its own RCU handling (it does its own little synchronization on every CPU
and does not rely on RCU sched).
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Amol Grover [Wed, 5 Feb 2020 05:57:02 +0000 (11:27 +0530)]
tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
Fix following instances of sparse error
kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison
Use rcu_dereference_protected to dereference the newly annotated pointer.
Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Amol Grover [Sat, 1 Feb 2020 07:27:04 +0000 (12:57 +0530)]
tracing: Annotate ftrace_graph_hash pointer with __rcu
Fix following instances of sparse error
kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison
Use rcu_dereference_protected to access the __rcu annotated pointer.
Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masahiro Yamada [Wed, 5 Feb 2020 06:51:52 +0000 (15:51 +0900)]
kbuild: make multiple directory targets work
Currently, the single-target build does not work when two
or more sub-directories are given:
$ make fs/ kernel/ lib/
CALL scripts/checksyscalls.sh
CALL scripts/atomic/check-atomics.sh
DESCEND objtool
make[2]: Nothing to be done for 'kernel/'.
make[2]: Nothing to be done for 'fs/'.
make[2]: Nothing to be done for 'lib/'.
Make it work properly.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Daniel Borkmann [Wed, 5 Feb 2020 21:06:09 +0000 (22:06 +0100)]
Merge branch 'bpf-xsk-fixes'
Maciej Fijalkowski says:
====================
Cameron reported [0] that on fresh bpf-next he could not run multiple
xdpsock instances in Tx-only mode on single network interface with i40e
driver.
Turns out that Maxim's series [1] which was adding RCU protection around
ndo_xsk_wakeup added check against the __I40E_CONFIG_BUSY being set on
pf->state within i40e_xsk_wakeup() - if it's set, return -ENETDOWN.
Since this bit is set per PF when UMEM is being enabled/disabled, the
situation Cameron stumbled upon was that when he launched second xdpsock
instance, second UMEM was being registered, hence set __I40E_CONFIG_BUSY
which is now observed by first xdpsock and therefore xdpsock's kick_tx()
gets -ENETDOWN as errno.
-ENETDOWN currently is not allowed in kick_tx(), so we were exiting the
first application. Such exit means also XDP program being unloaded and
its dedicated resources, which caused an -ENXIO being return in the
second xdpsock instance.
Let's fix the issue from both sides - protect ourselves from future
xdpsock crashes by allowing for -ENETDOWN errno being set in kick_tx()
(patch 3) and from driver side, return -EAGAIN for the case where PF is
busy (patch 1).
Remove also doubled variable from xdpsock_user.c (patch 2).
Note that ixgbe seems not to be affected since UMEM registration sets
the busy/disable bit per ring, not per PF.
[0]: https://www.spinics.net/lists/xdp-newbies/msg01558.html
[1]: https://lore.kernel.org/netdev/
20191217162023.16011-1-maximmi@mellanox.com/
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>