WANG Cong [Thu, 20 Oct 2016 21:19:46 +0000 (14:19 -0700)]
ipv4: use the right lock for ping_group_range
This reverts commit
a681574c99be23e4d20b769bf0e543239c364af5
("ipv4: disable BH in set_ping_group_range()") because we never
read ping_group_range in BH context (unlike local_port_range).
Then, since we already have a lock for ping_group_range, those
using ip_local_ports.lock for ping_group_range are clearly typos.
We might consider to share a same lock for both ping_group_range
and local_port_range w.r.t. space saving, but that should be for
net-next.
Fixes: a681574c99be ("ipv4: disable BH in set_ping_group_range()")
Fixes: ba6b918ab234 ("ping: move ping_group_range out of CONFIG_SYSCTL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 22 Oct 2016 20:17:54 +0000 (16:17 -0400)]
Merge branch 'dsa-bcm_sf2-do-not-rely-on-kexec_in_progress'
Florian Fainelli says:
====================
net: dsa: bcm_sf2: Do not rely on kexec_in_progress
These are the two patches following the discussing we had on kexec_in_progress.
Feel free to apply or discard them, thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 21 Oct 2016 21:21:56 +0000 (14:21 -0700)]
net: dsa: bcm_sf2: Do not rely on kexec_in_progress
After discussing with Eric, it turns out that, while using
kexec_in_progress is a nice optimization, which prevents us from always
powering on the integrated PHY, let's just turn it on in the shutdown
path.
This removes a dependency on kexec_in_progress which, according to Eric
should not be used by modules
Fixes: 2399d6143f85 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 21 Oct 2016 21:21:55 +0000 (14:21 -0700)]
Revert "kexec: Export kexec_in_progress to modules"
This reverts commit
97dcaa0fcfd24daa9a36c212c1ad1d5a97759212. Based on
the review discussion with Eric, we will come up with a different fix
for the bcm_sf2 driver which does not make it rely on the
kexec_in_progress value.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Sat, 22 Oct 2016 01:49:14 +0000 (21:49 -0400)]
netns: revert "netns: avoid disabling irq for netns id"
This reverts commit
bc51dddf98c9 ("netns: avoid disabling irq for
netns id") as it was found to cause problems with systems running
SELinux/audit, see the mailing list thread below:
* http://marc.info/?t=
147694653900002&r=1&w=2
Eventually we should be able to reintroduce this code once we have
rewritten the audit multicast code to queue messages much the same
way we do for unicast messages. A tracking issue for this can be
found below:
* https://github.com/linux-audit/audit-kernel/issues/23
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Elad Raz <e@eladraz.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Thu, 20 Oct 2016 06:35:12 +0000 (23:35 -0700)]
ipv6: fix a potential deadlock in do_ipv6_setsockopt()
Baozeng reported this deadlock case:
CPU0 CPU1
---- ----
lock([ 165.136033] sk_lock-AF_INET6);
lock([ 165.136033] rtnl_mutex);
lock([ 165.136033] sk_lock-AF_INET6);
lock([ 165.136033] rtnl_mutex);
Similar to commit
87e9f0315952
("ipv4: fix a potential deadlock in mcast getsockopt() path")
this is due to we still have a case, ipv6_sock_mc_close(),
where we acquire sk_lock before rtnl_lock. Close this deadlock
with the similar solution, that is always acquire rtnl lock first.
Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Oct 2016 14:25:22 +0000 (10:25 -0400)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from
Geert Uytterhoeven.
2) Fix wrong timeout in set elements added from packet path via
nft_dynset, from Anders K. Pedersen.
3) Remove obsolete nf_conntrack_events_retry_timeout sysctl
documentation, from Nicolas Dichtel.
4) Ensure proper initialization of log flags via xt_LOG, from
Liping Zhang.
5) Missing alias to autoload ipcomp, also from Liping Zhang.
6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping.
7) Wrong integer type in the new nft_parse_u32_check() function,
from Dan Carpenter.
8) Another wrong integer type declaration in nft_exthdr_init, also
from Dan Carpenter.
9) Fix insufficient mode validation in nft_range.
10) Fix compilation warning in nft_range due to possible uninitialized
value, from Arnd Bergmann.
11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to
calm down kmemcheck, from Florian Westphal.
12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached,
from Nicolas Dichtel.
13) Fix nf_queue() after conversion to single-linked hook list, related
to incorrect bypass flag handling and incorrect hook point of
reinjection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 21 Oct 2016 01:15:16 +0000 (18:15 -0700)]
kexec: Export kexec_in_progress to modules
The bcm_sf2 driver uses kexec_in_progress to know whether it can power
down an integrated PHY during shutdown, and can be built as a module.
Other modules may be using this in the future, so export it.
Fixes: 2399d6143f85 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2016 17:26:48 +0000 (10:26 -0700)]
ipv4: disable BH in set_ping_group_range()
In commit
4ee3bd4a8c746 ("ipv4: disable BH when changing ip local port
range") Cong added BH protection in set_local_port_range() but missed
that same fix was needed in set_ping_group_range()
Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2016 16:39:40 +0000 (09:39 -0700)]
udp: must lock the socket in udp_disconnect()
Baozeng Ding reported KASAN traces showing uses after free in
udp_lib_get_port() and other related UDP functions.
A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.
I could write a reproducer with two threads doing :
static int sock_fd;
static void *thr1(void *arg)
{
for (;;) {
connect(sock_fd, (const struct sockaddr *)arg,
sizeof(struct sockaddr_in));
}
}
static void *thr2(void *arg)
{
struct sockaddr_in unspec;
for (;;) {
memset(&unspec, 0, sizeof(unspec));
connect(sock_fd, (const struct sockaddr *)&unspec,
sizeof(unspec));
}
}
Problem is that udp_disconnect() could run without holding socket lock,
and this was causing list corruptions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 20 Oct 2016 16:32:19 +0000 (09:32 -0700)]
net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels
For a kernel that is being kexec'd we re-enable the integrated GPHY in
order for the subsequent MDIO bus scan to succeed and properly bind to
the bcm7xxx PHY driver. If we did not do that, the GPHY would be shut
down by the time the MDIO driver is probing the bus, and it would fail
to read the correct PHY OUI and therefore bind to an appropriate PHY
driver. Later on, this would cause DSA not to be able to successfully
attach to the PHY, and the interface would not be created at all.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Oct 2016 15:13:53 +0000 (17:13 +0200)]
bpf, test: fix ld_abs + vlan push/pop stress test
After commit
636c2628086e ("net: skbuff: Remove errornous length
validation in skb_vlan_pop()") mentioned test case stopped working,
throwing a -12 (ENOMEM) return code. The issue however is not due to
636c2628086e, but rather due to a buggy test case that got uncovered
from the change in behaviour in
636c2628086e.
The data_size of that test case for the skb was set to 1. In the
bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that
loop with: reading skb data, pushing 68 tags, reading skb data,
popping 68 tags, reading skb data, etc, in order to force a skb
expansion and thus trigger that JITs recache skb->data. Problem is
that initial data_size is too small.
While before
636c2628086e, the test silently bailed out due to the
skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an
error from failing skb_ensure_writable(). Set at least minimum of
ETH_HLEN as an initial length so that on first push of data, equivalent
pop will succeed.
Fixes: 4d9c5c53ac99 ("test_bpf: add bpf_skb_vlan_push/pop() tests")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Thu, 20 Oct 2016 13:58:02 +0000 (15:58 +0200)]
net: add recursion limit to GRO
Currently, GRO can do unlimited recursion through the gro_receive
handlers. This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem. Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.
This patch adds a recursion counter to the GRO layer to prevent stack
overflow. When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally. This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.
Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Bohac [Thu, 20 Oct 2016 10:29:26 +0000 (12:29 +0200)]
ipv6: properly prevent temp_prefered_lft sysctl race
The check for an underflow of tmp_prefered_lft is always false
because tmp_prefered_lft is unsigned. The intention of the check
was to guard against racing with an update of the
temp_prefered_lft sysctl, potentially resulting in an underflow.
As suggested by David Miller, the best way to prevent the race is
by reading the sysctl variable using READ_ONCE.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR")
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Mon, 17 Oct 2016 17:05:32 +0000 (18:05 +0100)]
netfilter: fix nf_queue handling
nf_queue handling is broken since
e3b37f11e6e4 ("netfilter: replace
list_head with single linked list") for two reasons:
1) If the bypass flag is set on, there are no userspace listeners and
we still have more hook entries to iterate over, then jump to the
next hook. Otherwise accept the packet. On nf_reinject() path, the
okfn() needs to be invoked.
2) We should not re-enter the same hook on packet reinjection. If the
packet is accepted, we have to skip the current hook from where the
packet was enqueued, otherwise the packets gets enqueued over and
over again.
This restores the previous list_for_each_entry_continue() behaviour
happening from nf_iterate() that was dealing with these two cases.
This patch introduces a new nf_queue() wrapper function so this fix
becomes simpler.
Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Nicolas Dichtel [Tue, 18 Oct 2016 12:37:32 +0000 (14:37 +0200)]
netfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reached
When the maximum evictions number is reached, do not wait 5 seconds before
the next run.
CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Giuseppe CAVALLARO [Thu, 20 Oct 2016 08:01:28 +0000 (10:01 +0200)]
stmmac: display the descriptors if DES0 = 0
It makes sense to display the descriptors even if
DES0 is zero. This helps for example in case of it
is needed to dump rx write-back descriptors to get
timestamp status.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Oct 2016 15:23:08 +0000 (11:23 -0400)]
Merge branch 'ncsi-fixes'
Gavin Shan says:
====================
net/ncsi: More bug fixes
This series fixes 2 issues that were found during NCSI's availability
testing on BCM5718 and improves HNCDSC AEN handler:
* PATCH[1] refactors the code so that minimal code change is put
to PATCH[2].
* PATCH[2] fixes the NCSI channel's stale link state before doing
failover.
* PATCH[3] chooses the hot channel, which was ever chosen as active
channel, when the available channels are all in link-down state.
* PATCH[4] improves Host Network Controller Driver Status Change
(HNCDSC) AEN handler
Changelog
=========
v2:
* Merged PATCH[v1 1/2] to PATCH[v2 1].
* Avoid if/else statements in ncsi_suspend_channel() as Joel suggested.
* Added comments to explain why we need retrieve last link states in
ncsi_suspend_channel().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 20 Oct 2016 00:45:52 +0000 (11:45 +1100)]
net/ncsi: Improve HNCDSC AEN handler
This improves AEN handler for Host Network Controller Driver Status
Change (HNCDSC):
* The channel's lock should be hold when accessing its state.
* Do failover when host driver isn't ready.
* Configure channel when host driver becomes ready.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 20 Oct 2016 00:45:51 +0000 (11:45 +1100)]
net/ncsi: Choose hot channel as active one if necessary
The issue was found on BCM5718 which has two NCSI channels in one
package: C0 and C1. C0 is in link-up state while C1 is in link-down
state. C0 is chosen as active channel until unplugging and plugging
C0's cable: On unplugging C0's cable, LSC (Link State Change) AEN
packet received on C0 to report link-down event. After that, C1 is
chosen as active channel. LSC AEN for link-up event is lost on C0
when plugging C0's cable back. We lose the network even C0 is usable.
This resolves the issue by recording the (hot) channel that was ever
chosen as active one. The hot channel is chosen to be active one
if none of available channels in link-up state. With this, C0 is still
the active one after unplugging C0's cable. LSC AEN packet received
on C0 when plugging its cable back.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 20 Oct 2016 00:45:50 +0000 (11:45 +1100)]
net/ncsi: Fix stale link state of inactive channels on failover
The issue was found on BCM5718 which has two NCSI channels in one
package: C0 and C1. Both of them are connected to different LANs,
means they are in link-up state and C0 is chosen as the active one
until resetting BCM5718 happens as below.
Resetting BCM5718 results in LSC (Link State Change) AEN packet
received on C0, meaning LSC AEN is missed on C1. When LSC AEN packet
received on C0 to report link-down, it fails over to C1 because C1
is in link-up state as software can see. However, C1 is in link-down
state in hardware. It means the link state is out of synchronization
between hardware and software, resulting in inappropriate channel (C1)
selected as active one.
This resolves the issue by sending separate GLS (Get Link Status)
commands to all channels in the package before trying to do failover.
The last link states of all channels in the package are retrieved.
With it, C0 (not C1) is selected as active one as expected.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 20 Oct 2016 00:45:49 +0000 (11:45 +1100)]
net/ncsi: Avoid if statements in ncsi_suspend_channel()
There are several if/else statements in the state machine implemented
by switch/case in ncsi_suspend_channel() to avoid duplicated code. It
makes the code a bit hard to be understood.
This drops if/else statements in ncsi_suspend_channel() to improve the
code readability as Joel Stanley suggested. Also, it becomes easy to
add more states in the state machine without affecting current code.
No logical changes introduced by this.
Suggested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Wed, 19 Oct 2016 14:42:39 +0000 (17:42 +0300)]
net/sched: act_mirred: Use passed lastuse argument
stats_update callback is called by NIC drivers doing hardware
offloading of the mirred action. Lastuse is passed as argument
to specify when the stats was actually last updated and is not
always the current time.
Fixes: 9798e6fe4f9b ('net: act_mirred: allow statistic updates from offloaded actions')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 20 Oct 2016 14:05:45 +0000 (16:05 +0200)]
mlxsw: pci: Fix reset wait for SwitchX2
SwitchX2 firmware does not implement reset done yet. Moreover, when
busy-polled for ready magic, that slows down firmware and reset takes
longer than the defined timeout, causing initialization to fail.
So restore the previous behaviour and just sleep in this case.
Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Thu, 20 Oct 2016 14:05:44 +0000 (16:05 +0200)]
mlxsw: switchx2: Fix ethernet port initialization
When creating an ethernet port fails, we must move the port to disable,
otherwise putting the port in switch partition 0 (ETH) or 1 (IB) will
always fails.
Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 20 Oct 2016 14:05:43 +0000 (16:05 +0200)]
mlxsw: spectrum_router: Make mlxsw_sp_router_fib4_del return void and remove warn
The function return value is not checked anywhere. Also, the warning
causes huge slowdown when removing large number of FIB entries which
were not offloaded, because of ordering issue. Ido's preparing
a patchset to fix the ordering issue, but that is definitelly not
net tree material.
Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 20 Oct 2016 14:05:42 +0000 (16:05 +0200)]
mlxsw: spectrum_router: Use correct tree index for binding
By a mistake, there is tree index 0 passed to RALTB. Should be
MLXSW_SP_LPM_TREE_MIN.
Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Reported-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Wed, 19 Oct 2016 07:06:41 +0000 (09:06 +0200)]
stmmac: fix and review the ptp registration.
The commit commit
7086605a6ab5 ("stmmac: fix error check when init ptp")
breaks the procedure added by the
commit
efee95f42b5d ("ptp_clock: future-proofing drivers against PTP
subsystem becoming optional")
So this patch tries to re-import the logic added by the latest
commit above: it makes sense to have the stmmac_ptp_register
as void function and, inside the main, the stmmac_init_ptp can fails
in case of the capability cannot be supported by the HW.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre TORGUE <alexandre.torgue@st.com>
Cc: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nicolas Pitre <nico@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 17 Oct 2016 19:50:23 +0000 (21:50 +0200)]
netfilter: x_tables: suppress kmemcheck warning
Markus Trippelsdorf reports:
WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (
ffff88001e605480)
4055601e0088ffff000000000000000090686d81ffffffff0000000000000000
u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
^
|RIP: 0010:[<
ffffffff8166e561>] [<
ffffffff8166e561>] nf_register_net_hook+0x51/0x160
[..]
[<
ffffffff8166e561>] nf_register_net_hook+0x51/0x160
[<
ffffffff8166eaaf>] nf_register_net_hooks+0x3f/0xa0
[<
ffffffff816d6715>] ipt_register_table+0xe5/0x110
[..]
This warning is harmless; we copy 'uninitialized' data from the hook ops
but it will not be used.
Long term the structures keeping run-time data should be disentangled
from those only containing config-time data (such as where in the list
to insert a hook), but thats -next material.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Dumazet [Tue, 18 Oct 2016 20:24:07 +0000 (13:24 -0700)]
tcp: do not export sysctl_tcp_low_latency
Since commit
b2fb4f54ecd4 ("tcp: uninline tcp_prequeue()") we no longer
access sysctl_tcp_low_latency from a module.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 18 Oct 2016 16:59:34 +0000 (18:59 +0200)]
rtnetlink: Add rtnexthop offload flag to compare mask
The offload flag is a status flag and should not be used by
FIB semantics for comparison.
Fixes: 37ed9493699c ("rtnetlink: add RTNH_F_EXTERNAL flag for fib offload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 18 Oct 2016 16:50:23 +0000 (18:50 +0200)]
switchdev: Execute bridge ndos only for bridge ports
We recently got the following warning after setting up a vlan device on
top of an offloaded bridge and executing 'bridge link':
WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
[...]
CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1
Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015
0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3
0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b
0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990
Call Trace:
[<
ffffffff8135eaa3>] dump_stack+0x63/0x90
[<
ffffffff8108c43b>] __warn+0xcb/0xf0
[<
ffffffff8108c56d>] warn_slowpath_null+0x1d/0x20
[<
ffffffffa01420d5>] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
[<
ffffffffa0142195>] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum]
[<
ffffffff816f151f>] switchdev_port_attr_get+0x4f/0x140
[<
ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
[<
ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
[<
ffffffff816f1d6b>] switchdev_port_bridge_getlink+0x5b/0xc0
[<
ffffffff816f2680>] ? switchdev_port_fdb_dump+0x90/0x90
[<
ffffffff815f5427>] rtnl_bridge_getlink+0xe7/0x190
[<
ffffffff8161a1b2>] netlink_dump+0x122/0x290
[<
ffffffff8161b0df>] __netlink_dump_start+0x15f/0x190
[<
ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
[<
ffffffff815fab46>] rtnetlink_rcv_msg+0x1a6/0x220
[<
ffffffff81208118>] ? __kmalloc_node_track_caller+0x208/0x2c0
[<
ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
[<
ffffffff815fa9a0>] ? rtnl_newlink+0x890/0x890
[<
ffffffff8161cf54>] netlink_rcv_skb+0xa4/0xc0
[<
ffffffff815f56f8>] rtnetlink_rcv+0x28/0x30
[<
ffffffff8161c92c>] netlink_unicast+0x18c/0x240
[<
ffffffff8161ccdb>] netlink_sendmsg+0x2fb/0x3a0
[<
ffffffff815c5a48>] sock_sendmsg+0x38/0x50
[<
ffffffff815c6031>] SYSC_sendto+0x101/0x190
[<
ffffffff815c7111>] ? __sys_recvmsg+0x51/0x90
[<
ffffffff815c6b6e>] SyS_sendto+0xe/0x10
[<
ffffffff817017f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
The problem is that the 8021q module propagates the call to
ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't
recognize the netdev, as it's not offloaded.
While we can ignore calls being made to non-bridge ports inside the
driver, a better fix would be to push this check up to the switchdev
layer.
Note that these ndos can be called for non-bridged netdev, but this only
happens in certain PF drivers which don't call the corresponding
switchdev functions anyway.
Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 19 Oct 2016 13:57:08 +0000 (16:57 +0300)]
net: core: Correctly iterate over lower adjacency list
Tamir reported the following trace when processing ARP requests received
via a vlan device on top of a VLAN-aware bridge:
NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[...]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.8.0-rc7 #1
Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
task:
ffff88017edfea40 task.stack:
ffff88017ee10000
RIP: 0010:[<
ffffffff815dcc73>] [<
ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60
[...]
Call Trace:
<IRQ>
[<
ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
[<
ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
[<
ffffffff810ad07a>] notifier_call_chain+0x4a/0x70
[<
ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20
[<
ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20
[<
ffffffff815f2eb6>] neigh_update+0x306/0x740
[<
ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0
[<
ffffffff8165ea3f>] arp_process+0x66f/0x700
[<
ffffffff8170214c>] ? common_interrupt+0x8c/0x8c
[<
ffffffff8165ec29>] arp_rcv+0x139/0x1d0
[<
ffffffff816e505a>] ? vlan_do_receive+0xda/0x320
[<
ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0
[<
ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20
[<
ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge]
[<
ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge]
[<
ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
[<
ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
[<
ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
[<
ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge]
[<
ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge]
[<
ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge]
[<
ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge]
[<
ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge]
[<
ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0
[<
ffffffff81374815>] ? find_next_bit+0x15/0x20
[<
ffffffff8135e802>] ? cpumask_next_and+0x32/0x50
[<
ffffffff810c9968>] ? load_balance+0x178/0x9b0
[<
ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
[<
ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
[<
ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
[<
ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
[<
ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
[<
ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
[<
ffffffff81091986>] tasklet_action+0xf6/0x110
[<
ffffffff81704556>] __do_softirq+0xf6/0x280
[<
ffffffff8109213f>] irq_exit+0xdf/0xf0
[<
ffffffff817042b4>] do_IRQ+0x54/0xd0
[<
ffffffff8170214c>] common_interrupt+0x8c/0x8c
The problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.
Fix this by advancing the iterator and avoid the infinite loop.
Fixes: 7ce856aaaf13 ("mlxsw: spectrum: Add couple of lower device helper functions")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Garver [Mon, 17 Oct 2016 20:30:12 +0000 (16:30 -0400)]
flow_dissector: Check skb for VLAN only if skb specified.
Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.
Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
Signed-off-by: Eric Garver <e@erig.me>
Reviewed-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 17 Oct 2016 15:17:51 +0000 (15:17 +0000)]
qed: Use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 17 Oct 2016 22:16:15 +0000 (00:16 +0200)]
rocker: fix maybe-uninitialized warning
In some rare configurations, we get a warning about the 'index' variable
being used without an initialization:
drivers/net/ethernet/rocker/rocker_ofdpa.c: In function ‘ofdpa_port_fib_ipv4.isra.16.constprop’:
drivers/net/ethernet/rocker/rocker_ofdpa.c:2425:92: warning: ‘index’ may be used uninitialized in this function [-Wmaybe-uninitialized]
This is a false positive, the logic is just a bit too complex for gcc
to follow here. Moving the intialization of 'index' a little further
down makes it clear to gcc that the function always returns an error
if it is not initialized.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 17 Oct 2016 22:16:09 +0000 (00:16 +0200)]
net/hyperv: avoid uninitialized variable
The hdr_offset variable is only if we deal with a TCP or UDP packet,
but as the check surrounding its usage tests for skb_is_gso()
instead, the compiler has no idea if the variable is initialized
or not at that point:
drivers/net/hyperv/netvsc_drv.c: In function ‘netvsc_start_xmit’:
drivers/net/hyperv/netvsc_drv.c:494:42: error: ‘hdr_offset’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
This adds an additional check for the transport type, which
tells the compiler that this path cannot happen. Since the
get_net_transport_info() function should always be inlined
here, I don't expect this to result in additional runtime
checks.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 17 Oct 2016 22:16:08 +0000 (00:16 +0200)]
net: bcm63xx: avoid referencing uninitialized variable
gcc found a reference to an uninitialized variable in the error handling
of bcm_enet_open, introduced by a recent cleanup:
drivers/net/ethernet/broadcom/bcm63xx_enet.c: In function 'bcm_enet_open'
drivers/net/ethernet/broadcom/bcm63xx_enet.c:1129:2: warning: 'phydev' may be used uninitialized in this function [-Wmaybe-uninitialized]
This makes the use of that variable conditional, so we only reference it
here after it has been used before. Unlike my normal patches, I have not
build-tested this one, as I don't currently have mips test in my
randconfig setup.
Fixes: 625eb8667d6f ("net: ethernet: broadcom: bcm63xx: use phydev from struct net_device")
Cc: Philippe Reynes <tremyfr@gmail.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 17 Oct 2016 21:22:48 +0000 (14:22 -0700)]
soreuseport: do not export reuseport_add_sock()
reuseport_add_sock() is not used from a module,
no need to export it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Mon, 17 Oct 2016 20:28:10 +0000 (15:28 -0500)]
ibmvnic: Update MTU after device initialization
It is possible for the MTU to be changed during the initialization
process with the VNIC Server. Ensure that the net device is updated
to reflect the new MTU.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Mon, 17 Oct 2016 20:28:09 +0000 (15:28 -0500)]
ibmvnic: Fix GFP_KERNEL allocation in interrupt context
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Mon, 17 Oct 2016 20:56:29 +0000 (15:56 -0500)]
ibmvnic: Driver Version 1.0.1
Increment driver version to reflect features that have
been added since release.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 18 Oct 2016 16:09:48 +0000 (18:09 +0200)]
bridge: multicast: restore perm router ports on multicast enable
Satish reported a problem with the perm multicast router ports not getting
reenabled after some series of events, in particular if it happens that the
multicast snooping has been disabled and the port goes to disabled state
then it will be deleted from the router port list, but if it moves into
non-disabled state it will not be re-added because the mcast snooping is
still disabled, and enabling snooping later does nothing.
Here are the steps to reproduce, setup br0 with snooping enabled and eth1
added as a perm router (multicast_router = 2):
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
<empty>
At this point we have mcast enabled and eth1 as a perm router (value = 2)
but it is not in the router list which is incorrect.
After this change:
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
router ports on br0: eth1
Note: we can directly do br_multicast_enable_port for all because the
querier timer already has checks for the port state and will simply
expire if it's in blocking/disabled. See the comment added by
commit
9aa66382163e7 ("bridge: multicast: add a comment to
br_port_state_selection about blocking state")
Fixes: 561f1103a2b7 ("bridge: Add multicast_snooping sysfs toggle")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 17 Oct 2016 22:05:30 +0000 (00:05 +0200)]
netfilter: nf_tables: avoid uninitialized variable warning
The newly added nft_range_eval() function handles the two possible
nft range operations, but as the compiler warning points out,
any unexpected value would lead to the 'mismatch' variable being
used without being initialized:
net/netfilter/nft_range.c: In function 'nft_range_eval':
net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This removes the variable in question and instead moves the
condition into the switch itself, which is potentially more
efficient than adding a bogus 'default' clause as in my
first approach, and is nicer than using the 'uninitialized_var'
macro.
Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Link: http://patchwork.ozlabs.org/patch/677114/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tobias Klauser [Tue, 18 Oct 2016 09:22:54 +0000 (11:22 +0200)]
tcp: Remove unused but set variable
Remove the unused but set variable icsk in listening_get_next to fix the
following GCC warning when building with 'W=1':
net/ipv4/tcp_ipv4.c: In function ‘listening_get_next’:
net/ipv4/tcp_ipv4.c:1890:31: warning: variable ‘icsk’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Tue, 18 Oct 2016 08:51:25 +0000 (14:21 +0530)]
cxgb4: Fix number of queue sets corssing the limit
Do not let number of offload queue sets to go more than
MAX_OFLD_QSETS, which would otherwise crash the driver
on machines with cores more than MAX_OFLD_QSETS.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Tue, 18 Oct 2016 07:40:20 +0000 (09:40 +0200)]
ipv4: Remove unused but set variable
Remove the unused but set variable dev in ip_do_fragment to fix the
following GCC warning when building with 'W=1':
net/ipv4/ip_output.c: In function ‘ip_do_fragment’:
net/ipv4/ip_output.c:541:21: warning: variable ‘dev’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Tue, 18 Oct 2016 07:20:55 +0000 (09:20 +0200)]
dwc_eth_qos: enable flow control by default
Allow autoneg to enable flow control by default.
The behavior when autoneg is off has not changed.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
Acked-by: Lars Persson <larper@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Tue, 18 Oct 2016 07:20:33 +0000 (09:20 +0200)]
dwc_eth_qos: do not clear pause flags from phy_device->supported
phy_device->supported is originally set by the PHY driver.
The ethernet driver should filter phy_device->supported to only contain
flags supported by the IP.
The IP supports setting rx and tx flow control independently,
therefore SUPPORTED_Pause and SUPPORTED_Asym_Pause should not be cleared.
If the flags are cleared, pause frames cannot be enabled (even if they
are supported by the PHY).
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Jesper Nilsson <jespern@axis.com>
Acked-by: Lars Persson <larper@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Tue, 18 Oct 2016 07:07:29 +0000 (09:07 +0200)]
net/hsr: Remove unused but set variable
Remove the unused but set variable master_dev in check_local_dest to fix
the following GCC warning when building with 'W=1':
net/hsr/hsr_forward.c: In function ‘check_local_dest’:
net/hsr/hsr_forward.c:303:21: warning: variable ‘master_dev’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 14:26:15 +0000 (10:26 -0400)]
Merge tag 'mac80211-for-davem-2016-10-18' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
This is relatively small, mostly to get the SG/crypto
from stack removal fix that crashes things when VMAP
stack is used in conjunction with software crypto.
Aside from that, we have:
* a fix for AP_VLAN usage with the nl80211 frame command
* two fixes (and two preparation patches) for A-MSDU, one
to discard group-addressed (multicast) and unexpected
4-address A-MSDUs, the other to validate A-MSDU inner
MAC addresses properly to prevent controlled port bypass
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Tue, 18 Oct 2016 06:16:03 +0000 (08:16 +0200)]
bnx2: fix locking when netconsole is used
Functions bnx2_reg_rd_ind(), bnx2_reg_wr_ind() and bnx2_ctx_wr()
can be called with IRQs disabled when netconsole is enabled. So they
should use spin_{,un}lock_irq{save,restore} instead of _bh variants.
Example call flow:
bnx2_poll()
->bnx2_poll_link()
->bnx2_phy_int()
->bnx2_set_remote_link()
->bnx2_shmem_rd()
->bnx2_reg_rd_ind()
-> spin_lock_bh(&bp->indirect_lock);
spin_unlock_bh(&bp->indirect_lock);
...
-> __local_bh_enable_ip
static inline void __local_bh_enable_ip(unsigned long ip)
WARN_ON_ONCE(in_irq() || irqs_disabled()); <<<<<< WARN
Cc: Sony Chacko <sony.chacko@qlogic.com>
Cc: Dept-HSGLinuxNICDev@qlogic.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Oct 2016 17:03:04 +0000 (13:03 -0400)]
Merge branch 'net-driver-autoload'
Javier Martinez Canillas says:
====================
net: Fix module autoload for several platform drivers
I noticed that module autoload won't be working in a bunch of platform
drivers in the net subsystem and this patch series contains the fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Mon, 17 Oct 2016 14:05:46 +0000 (11:05 -0300)]
net: dsa: bcm_sf2: Fix module autoload for OF registration
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/dsa/bcm_sf2.ko | grep alias
alias: platform:brcm-sf2
After this patch:
$ modinfo drivers/net/dsa/bcm_sf2.ko | grep alias
alias: platform:brcm-sf2
alias: of:N*T*Cbrcm,bcm7445-switch-v4.0C*
alias: of:N*T*Cbrcm,bcm7445-switch-v4.0
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Mon, 17 Oct 2016 14:05:45 +0000 (11:05 -0300)]
net: dsa: b53: Fix module autoload
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/dsa/b53/b53_mmap.ko | grep alias
$
After this patch:
$ modinfo drivers/net/dsa/b53/b53_mmap.ko | grep alias
alias: of:N*T*Cbrcm,bcm63xx-switchC*
alias: of:N*T*Cbrcm,bcm63xx-switch
alias: of:N*T*Cbrcm,bcm6368-switchC*
alias: of:N*T*Cbrcm,bcm6368-switch
alias: of:N*T*Cbrcm,bcm6328-switchC*
alias: of:N*T*Cbrcm,bcm6328-switch
alias: of:N*T*Cbrcm,bcm3384-switchC*
alias: of:N*T*Cbrcm,bcm3384-switch
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Mon, 17 Oct 2016 14:05:44 +0000 (11:05 -0300)]
net: hisilicon: Fix hns_mdio module autoload for OF registration
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/ethernet/hisilicon//hns_mdio.ko | grep alias
alias: platform:Hi-HNS_MDIO
alias: acpi*:HISI0141:*
After this patch:
$ modinfo drivers/net/ethernet/hisilicon//hns_mdio.ko | grep alias
alias: platform:Hi-HNS_MDIO
alias: of:N*T*Chisilicon,hns-mdioC*
alias: of:N*T*Chisilicon,hns-mdio
alias: of:N*T*Chisilicon,mdioC*
alias: of:N*T*Chisilicon,mdio
alias: acpi*:HISI0141:*
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Mon, 17 Oct 2016 14:05:43 +0000 (11:05 -0300)]
net: qcom/emac: Fix module autoload for OF registration
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/ethernet/qualcomm/emac/qcom-emac.ko | grep alias
alias: platform:qcom-emac
After this patch:
$ modinfo drivers/net/ethernet/qualcomm/emac/qcom-emac.ko | grep alias
alias: platform:qcom-emac
alias: of:N*T*Cqcom,fsm9900-emacC*
alias: of:N*T*Cqcom,fsm9900-emac
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Mon, 17 Oct 2016 14:05:42 +0000 (11:05 -0300)]
net: hns: Fix hns_dsaf module autoload for OF registration
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/ethernet/hisilicon/hns/hns_dsaf.ko | grep alias
alias: acpi*:HISI00B2:*
alias: acpi*:HISI00B1:*
After this patch:
$ modinfo drivers/net/ethernet/hisilicon/hns/hns_dsaf.ko | grep alias
alias: acpi*:HISI00B2:*
alias: acpi*:HISI00B1:*
alias: of:N*T*Chisilicon,hns-dsaf-v2C*
alias: of:N*T*Chisilicon,hns-dsaf-v2
alias: of:N*T*Chisilicon,hns-dsaf-v1C*
alias: of:N*T*Chisilicon,hns-dsaf-v1
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Mon, 17 Oct 2016 14:05:41 +0000 (11:05 -0300)]
net: ethernet: nb8800: Fix module autoload
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ $ modinfo drivers/net/ethernet/aurora/nb8800.ko | grep alias
$
After this patch:
$ modinfo drivers/net/ethernet/aurora/nb8800.ko | grep alias
alias: of:N*T*Csigma,smp8734-ethernetC*
alias: of:N*T*Csigma,smp8734-ethernet
alias: of:N*T*Csigma,smp8642-ethernetC*
alias: of:N*T*Csigma,smp8642-ethernet
alias: of:N*T*Caurora,nb8800C*
alias: of:N*T*Caurora,nb8800
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Mon, 17 Oct 2016 14:05:40 +0000 (11:05 -0300)]
net: nps_enet: Fix module autoload
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/ethernet/ezchip/nps_enet.ko | grep alias
$
After this patch:
$ modinfo drivers/net/ethernet/ezchip/nps_enet.ko | grep alias
alias: of:N*T*Cezchip,nps-mgt-enetC*
alias: of:N*T*Cezchip,nps-mgt-enet
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Thu, 13 Oct 2016 06:42:17 +0000 (08:42 +0200)]
netfilter: nft_range: validate operation netlink attribute
Use nft_parse_u32_check() to make sure we don't get a value over the
unsigned 8-bit integer. Moreover, make sure this value doesn't go over
the two supported range comparison modes.
Fixes: 9286c2eb1fda ("netfilter: nft_range: validate operation netlink attribute")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Dan Carpenter [Wed, 12 Oct 2016 06:09:12 +0000 (09:09 +0300)]
netfilter: nft_exthdr: fix error handling in nft_exthdr_init()
"err" needs to be signed for the error handling to work.
Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Dan Carpenter [Wed, 12 Oct 2016 09:14:29 +0000 (12:14 +0300)]
netfilter: nf_tables: underflow in nft_parse_u32_check()
We don't want to allow negatives here.
Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Wed, 12 Oct 2016 13:10:45 +0000 (21:10 +0800)]
netfilter: nft_hash: add missing NFTA_HASH_OFFSET's nla_policy
Missing the nla_policy description will also miss the validation check
in kernel.
Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Wed, 12 Oct 2016 13:09:22 +0000 (21:09 +0800)]
netfilter: xt_ipcomp: add "ip[6]t_ipcomp" module alias name
Otherwise, user cannot add related rules if xt_ipcomp.ko is not loaded:
# iptables -A OUTPUT -p 108 -m ipcomp --ipcompspi 1
iptables: No chain/target/match by that name.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Tue, 11 Oct 2016 13:03:45 +0000 (21:03 +0800)]
netfilter: xt_NFLOG: fix unexpected truncated packet
Justin and Chris spotted that iptables NFLOG target was broken when they
upgraded the kernel to 4.8: "ulogd-2.0.5- IPs are no longer logged" or
"results in segfaults in ulogd-2.0.5".
Because "struct nf_loginfo li;" is a local variable, and flags will be
filled with garbage value, not inited to zero. So if it contains 0x1,
packets will not be logged to the userspace anymore.
Fixes: 7643507fe8b5 ("netfilter: xt_NFLOG: nflog-range does not truncate packets")
Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Reported-by: Chris Caputo <ccaputo@alt.net>
Tested-by: Chris Caputo <ccaputo@alt.net>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Nicolas Dichtel [Mon, 10 Oct 2016 10:18:23 +0000 (12:18 +0200)]
netfilter: conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout)
This entry has been removed in commit
9500507c6138.
Fixes: 9500507c6138 ("netfilter: conntrack: remove timer from ecache extension")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anders K. Pedersen [Sun, 9 Oct 2016 13:49:02 +0000 (13:49 +0000)]
netfilter: nft_dynset: fix element timeout for HZ != 1000
With HZ=100 element timeout in dynamic sets (i.e. flow tables) is 10 times
higher than configured.
Add proper conversion to/from jiffies, when interacting with userspace.
I tested this on Linux 4.8.1, and it applies cleanly to current nf and
nf-next trees.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Geert Uytterhoeven [Thu, 6 Oct 2016 13:40:14 +0000 (15:40 +0200)]
netfilter: xt_hashlimit: Add missing ULL suffixes for 64-bit constants
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):
net/netfilter/xt_hashlimit.c: In function ‘user2credits’:
net/netfilter/xt_hashlimit.c:476: warning: integer constant is too large for ‘long’ type
...
net/netfilter/xt_hashlimit.c:478: warning: integer constant is too large for ‘long’ type
...
net/netfilter/xt_hashlimit.c:480: warning: integer constant is too large for ‘long’ type
...
net/netfilter/xt_hashlimit.c: In function ‘rateinfo_recalc’:
net/netfilter/xt_hashlimit.c:513: warning: integer constant is too large for ‘long’ type
Fixes: 11d5f15723c9f39d ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Colin Ian King [Sun, 16 Oct 2016 22:54:03 +0000 (23:54 +0100)]
cxgb4: fix memory leak of qe on error exit path
A memory leak of qe occurs when t4_sched_queue_unbind fails,
so fix this by free'ing qe on the error exit path.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 15 Oct 2016 15:50:49 +0000 (17:50 +0200)]
net: pktgen: remove rcu locking in pktgen_change_name()
After Jesper commit back in linux-3.18, we trigger a lockdep
splat in proc_create_data() while allocating memory from
pktgen_change_name().
This patch converts t->if_lock to a mutex, since it is now only
used from control path, and adds proper locking to pktgen_change_name()
1) pktgen_thread_lock to protect the outer loop (iterating threads)
2) t->if_lock to protect the inner loop (iterating devices)
Note that before Jesper patch, pktgen_change_name() was lacking proper
protection, but lockdep was not able to detect the problem.
Fixes: 8788370a1d4b ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()")
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 17 Oct 2016 03:02:52 +0000 (20:02 -0700)]
net: Require exact match for TCP socket lookups if dif is l3mdev
Currently, socket lookups for l3mdev (vrf) use cases can match a socket
that is bound to a port but not a device (ie., a global socket). If the
sysctl tcp_l3mdev_accept is not set this leads to ack packets going out
based on the main table even though the packet came in from an L3 domain.
The end result is that the connection does not establish creating
confusion for users since the service is running and a socket shows in
ss output. Fix by requiring an exact dif to sk_bound_dev_if match if the
skb came through an interface enslaved to an l3mdev device and the
tcp_l3mdev_accept is not set.
skb's through an l3mdev interface are marked by setting a flag in
inet{6}_skb_parm. The IPv6 variant is already set; this patch adds the
flag for IPv4. Using an skb flag avoids a device lookup on the dif. The
flag is set in the VRF driver using the IP{6}CB macros. For IPv4, the
inet_skb_parm struct is moved in the cb per commit
971f10eca186, so the
match function in the TCP stack needs to use TCP_SKB_CB. For IPv6, the
move is done after the socket lookup, so IP6CB is used.
The flags field in inet_skb_parm struct needs to be increased to add
another flag. There is currently a 1-byte hole following the flags,
so it can be expanded to u16 without increasing the size of the struct.
Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ard Biesheuvel [Mon, 17 Oct 2016 14:05:33 +0000 (15:05 +0100)]
mac80211: move struct aead_req off the stack
Some crypto implementations (such as the generic CCM wrapper in crypto/)
use scatterlists to map fields of private data in their struct aead_req.
This means these data structures cannot live in the vmalloc area, which
means that they cannot live on the stack (with CONFIG_VMAP_STACK.)
This currently occurs only with the generic software implementation, but
the private data and usage is implementation specific, so move the whole
data structures off the stack into heap by allocating every time we need
to use them.
In addition, take care not to put any of our own stack allocations into
scatterlists. This involves reserving some extra room when allocating the
aead_request structures, and referring to those allocations in the scatter-
lists (while copying the data from the stack before the crypto operation)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Alexey Khoroshilov [Fri, 14 Oct 2016 21:01:20 +0000 (00:01 +0300)]
vmxnet3: avoid assumption about invalid dma_pa in vmxnet3_set_mc()
vmxnet3_set_mc() checks new_table_pa returned by dma_map_single()
with dma_mapping_error(), but even there it assumes zero is invalid pa
(it assumes dma_mapping_error(...,0) returns true if new_table is NULL).
The patch adds an explicit variable to track status of new_table_pa.
Found by Linux Driver Verification project (linuxtesting.org).
v2: use "bool" and "true"/"false" for boolean variables.
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 14 Oct 2016 19:26:11 +0000 (22:26 +0300)]
stmmac: fix an error code in stmmac_ptp_register()
PTR_ERR(NULL) is success. We have to preserve the error code earlier.
Fixes: 7086605a6ab5 ("stmmac: fix error check when init ptp")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Fri, 14 Oct 2016 19:14:35 +0000 (14:14 -0500)]
net: qcom/emac: disable interrupts before calling phy_disconnect
There is a race condition that can occur if EMAC interrupts are
enabled when phy_disconnect() is called. phy_disconnect() sets
adjust_link to NULL. When an interrupt occurs, the ISR might
call phy_mac_interrupt(), which wakes up the workqueue function
phy_state_machine(). This function might reference adjust_link,
thereby causing a null pointer exception.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ard Biesheuvel [Fri, 14 Oct 2016 13:40:33 +0000 (14:40 +0100)]
r8169: set coherent DMA mask as well as streaming DMA mask
PCI devices that are 64-bit DMA capable should set the coherent
DMA mask as well as the streaming DMA mask. On some architectures,
these are managed separately, and so the coherent DMA mask will be
left at its default value of 32 if it is not set explicitly. This
results in errors such as
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
hwdev DMA mask = 0x00000000ffffffff, dev_addr = 0x00000080fbfff000
swiotlb: coherent allocation failed for device 0000:02:00.0 size=4096
CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35
Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016
on systems without memory that is 32-bit addressable by PCI devices.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Oct 2016 20:08:13 +0000 (16:08 -0400)]
Merge tag 'wireless-drivers-for-davem-2016-10-14' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.9
wlcore
* fix a double free regression causing hard to track crashes
rtl8xxxu
* fix driver reload issues, a memory leak and an endian bug
rtlwifi
* fix a major regression introduced in 4.9 with firmware loading on
certain hardware
ath10k
* fix regression about broken cal_data debugfs file (since 4.7)
ath9k
* revert temperature compensation for AR9003+ devices, it was causing
too much problems
ath6kl
* add Dell OEM SDIO I/O for the Venue 8 Pro
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Thu, 13 Oct 2016 23:43:16 +0000 (16:43 -0700)]
net: asix: Avoid looping when the device does not respond
Check answers from USB stack and avoid re-sending the request
multiple times if the device does not respond.
This fixes the following problem, observed with a probably flaky adapter.
[62108.732707] usb 1-3: new high-speed USB device number 5 using xhci_hcd
[62108.914421] usb 1-3: New USB device found, idVendor=0b95, idProduct=7720
[62108.914463] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[62108.914476] usb 1-3: Product: AX88x72A
[62108.914486] usb 1-3: Manufacturer: ASIX Elec. Corp.
[62108.914495] usb 1-3: SerialNumber: 000001
[62114.109109] asix 1-3:1.0 (unnamed net_device) (uninitialized):
Failed to write reg index 0x0000: -110
[62114.109139] asix 1-3:1.0 (unnamed net_device) (uninitialized):
Failed to send software reset:
ffffff92
[62119.109048] asix 1-3:1.0 (unnamed net_device) (uninitialized):
Failed to write reg index 0x0000: -110
...
Since the USB timeout is 5 seconds, and the operation is retried 30 times,
this results in
[62278.180353] INFO: task mtpd:1725 blocked for more than 120 seconds.
[62278.180373] Tainted: G W
3.18.0-13298-g94ace9e #1
[62278.180383] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
[62278.180957] kworker/2:0 D
0000000000000000 0 5744 2 0x00000000
[62278.180978] Workqueue: usb_hub_wq hub_event
[62278.181029]
ffff880177f833b8 0000000000000046 ffff88017fd00000 ffff88017b126d80
[62278.181048]
ffff880177f83fd8 ffff880065a71b60 0000000000013340 ffff880065a71b60
[62278.181065]
0000000000000286 0000000103b1c199 0000000000001388 0000000000000002
[62278.181081] Call Trace:
[62278.181092] [<
ffffffff8e0971fd>] ? console_conditional_schedule+0x2c/0x2c
[62278.181105] [<
ffffffff8e094f7b>] schedule+0x69/0x6b
[62278.181117] [<
ffffffff8e0972e0>] schedule_timeout+0xe3/0x11d
[62278.181133] [<
ffffffff8daadb1b>] ? trace_timer_start+0x51/0x51
[62278.181146] [<
ffffffff8e095a05>] do_wait_for_common+0x12f/0x16c
[62278.181162] [<
ffffffff8da856a7>] ? wake_up_process+0x39/0x39
[62278.181174] [<
ffffffff8e095aee>] wait_for_common+0x52/0x6d
[62278.181187] [<
ffffffff8e095b3b>] wait_for_completion_timeout+0x13/0x15
[62278.181201] [<
ffffffff8de676ce>] usb_start_wait_urb+0x93/0xf1
[62278.181214] [<
ffffffff8de6780d>] usb_control_msg+0xe1/0x11d
[62278.181230] [<
ffffffffc037d629>] usbnet_write_cmd+0x9c/0xc6 [usbnet]
[62278.181286] [<
ffffffffc03af793>] asix_write_cmd+0x4e/0x7e [asix]
[62278.181300] [<
ffffffffc03afb41>] asix_set_sw_mii+0x25/0x4e [asix]
[62278.181314] [<
ffffffffc03b001d>] asix_mdio_read+0x51/0x109 [asix]
...
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Thu, 13 Oct 2016 23:13:55 +0000 (16:13 -0700)]
ethtool: silence warning on bit loss
Sparse was complaining when we went to prototype some code
using ethtool_cmd_speed_set and SPEED_100000, which uses
the upper 16 bits of __u32 speed for the first time.
CHECK
...
.../uapi/linux/ethtool.h:123:28: warning:
cast truncates bits from constant value (186a0 becomes 86a0)
The warning is actually bogus, as no bits are really lost, but
we can get rid of the sparse warning with this one small change.
Reported-by: Preethi Banala <preethi.banala@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Thu, 13 Oct 2016 20:13:11 +0000 (13:13 -0700)]
net/mlx4_en: fixup xdp tx irq to match rx
In cases where the number of tx rings is not a multiple of the number of
rx rings, the tx completion event will be handled on a different core
from the transmit and population of the ring. Races on the ring will
lead to a double-free of the page, and possibly other corruption.
The rings are initialized by default with a valid multiple of rings,
based on the number of cpus, therefore an invalid configuration requires
ethtool to change the ring layout. For instance 'ethtool -L eth0 rx 9 tx
8' will cause packets received on rx0, and XDP_TX'd to tx48, to be
completed on cpu3 (48 % 9 == 3).
Resolve this discrepancy by shifting the irq for the xdp tx queues to
start again from 0, modulo rx_ring_num.
Fixes: 9ecc2d86171a ("net/mlx4_en: add xdp forwarding and data write support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Oct 2016 15:07:23 +0000 (11:07 -0400)]
Merge branch 'qed-fixes'
Yuval Mintz says:
====================
qed: Fix dependencies and warnings series
The first patch in this series follows Dan Carpenter's reports about
Smatch warnings for recent qed additions and fixes those.
The second patch is the most significant one [and the reason this is
ntended for 'net'] - it's based on Arnd Bermann's suggestion for fixing
compilation issues that were introduced with the roce addition as a result
of certain combinations of qed, qede and qedr Kconfig options.
The third follows the discussion with Arnd and clears a lot of the warnings
that arise when compiling the drivers with "C=1".
Please consider applying this series to 'net'.
====================
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 13 Oct 2016 19:57:03 +0000 (22:57 +0300)]
qed: Additional work toward cleaning C=1
This cleans many of the warnings that would arise in qed as a
result of compilations with C=1; Most of those are the addition
of missing 'static' to functions, although there are several other
fixes as well.
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 13 Oct 2016 19:57:02 +0000 (22:57 +0300)]
qed*: Fix Kconfig dependencies with INFINIBAND_QEDR
The qedr driver would require a tristate Kconfig option [to allow
it to compile as a module], and toward that end we've added the
INFINIBAND_QEDR option. But as we've made the compilation of the
qed/qede infrastructure required for RoCE dependent on the option
we'd be facing linking difficulties in case that QED=y or QEDE=y,
and INFINIBAND_QEDR=m.
To resolve this, we seperate between the INFINIBAND_QEDR option
and the infrastructure support in qed/qede by introducing a new
QED_RDMA option which would be selected by INFINIBAND_QEDR but would
be a boolean instead of a tristate; Following that, the qed/qede is
fixed based on this new option so that all config combinations would
be supported.
Fixes: cee9fbd8e2e9 ("qede: add qedr framework")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 13 Oct 2016 19:57:01 +0000 (22:57 +0300)]
qed: Fix static checker warning.
Smatch compains about qed_roce_ll2_tx() dereference
of the 'cdev' variable while testing its validity later.
As the validation checking is an over-kill [variable would always
be set], simply remove it.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: abd49676c707 ("qed: Add RoCE ll2 & GSI support")
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Bohac [Thu, 13 Oct 2016 16:52:15 +0000 (18:52 +0200)]
IPv6: fix DESYNC_FACTOR
The IPv6 temporary address generation uses a variable called DESYNC_FACTOR
to prevent hosts updating the addresses at the same time. Quoting RFC 4941:
... The value DESYNC_FACTOR is a random value (different for each
client) that ensures that clients don't synchronize with each other and
generate new addresses at exactly the same time ...
DESYNC_FACTOR is defined as:
DESYNC_FACTOR -- A random value within the range 0 - MAX_DESYNC_FACTOR.
It is computed once at system start (rather than each time it is used)
and must never be greater than (TEMP_VALID_LIFETIME - REGEN_ADVANCE).
First, I believe the RFC has a typo in it and meant to say: "and must
never be greater than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE)"
The reason is that at various places in the RFC, DESYNC_FACTOR is used in
a calculation like (TEMP_PREFERRED_LIFETIME - DESYNC_FACTOR) or
(TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR). It needs to be
smaller than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE) for the result of
these calculations to be larger than zero. It's never used in a
calculation together with TEMP_VALID_LIFETIME.
I already submitted an errata to the rfc-editor:
https://www.rfc-editor.org/errata_search.php?rfc=4941
The Linux implementation of DESYNC_FACTOR is very wrong:
max_desync_factor is used in places DESYNC_FACTOR should be used.
max_desync_factor is initialized to the RFC-recommended value for
MAX_DESYNC_FACTOR (600) but the whole point is to get a _random_ value.
And nothing ensures that the value used is not greater than
(TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE), which leads to underflows. The
effect can easily be observed when setting the temp_prefered_lft sysctl
e.g. to 60. The preferred lifetime of the temporary addresses will be
bogus.
TEMP_PREFERRED_LIFETIME and REGEN_ADVANCE are not constants and can be
influenced by these three sysctls: regen_max_retry, dad_transmits and
temp_prefered_lft. Thus, the upper bound for desync_factor needs to be
re-calculated each time a new address is generated and if desync_factor is
larger than the new upper bound, a new random value needs to be
re-generated.
And since we already have max_desync_factor configurable per interface, we
also need to calculate and store desync_factor per interface.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Bohac [Thu, 13 Oct 2016 16:50:02 +0000 (18:50 +0200)]
IPv6: Drop the temporary address regen_timer
The randomized interface identifier (rndid) was periodically updated from
the regen_timer timer. Simplify the code by updating the rndid only when
needed by ipv6_try_regen_rndid().
This makes the follow-up DESYNC_FACTOR fix much simpler. Also it fixes a
reference counting error in this error path, where an in6_dev_put was
missing:
err = addrconf_sysctl_register(ndev);
if (err) {
ipv6_mc_destroy_dev(ndev);
- del_timer(&ndev->regen_timer);
snmp6_unregister_dev(ndev);
goto err_release;
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 13 Oct 2016 16:26:56 +0000 (18:26 +0200)]
IB/ipoib: move back IB LL address into the hard header
After the commit
9207f9d45b0a ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:
http://marc.info/?l=linux-kernel&m=
146787279825501&w=2
This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit
936d7de3d736 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
To avoid changing the connected mode maximum mtu, the allocated
head buffer size is increased by the pseudo header length.
After this commit, IPoIB performances are back to pre-regression
value.
v2 -> v3: rebased
v1 -> v2: avoid changing the max mtu, increasing the head buf size
Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Oct 2016 14:44:45 +0000 (10:44 -0400)]
Merge tag 'rxrpc-rewrite-
20161013' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fixes
This set of patches contains a bunch of fixes:
(1) Fix use of kunmap() after change from kunmap_atomic() within AFS.
(2) Don't use of ERR_PTR() with an always zero value.
(3) Check the right error when using ip6_route_output().
(4) Be consistent about whether call->operation_ID is BE or CPU-E within
AFS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Alemayhu [Thu, 13 Oct 2016 15:09:51 +0000 (17:09 +0200)]
Documentation/networking: update git urls to use https over http
This fixes the following errors when trying to clone the urls:
Cloning into 'net'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/' not found
Cloning into 'net-next'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/' not found
Cloning into 'linux'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/' not found
Cloning into 'stable-queue'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/' not found
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Wed, 12 Oct 2016 19:05:59 +0000 (16:05 -0300)]
net: wan: slic_ds26522: Allow driver to built if COMPILE_TEST is enabled
The driver only has runtime but no build time dependency with FSL_SOC ||
ARCH_MXC || ARCH_LAYERSCAPE. So it can be built for testing purposes if
the COMPILE_TEST option is enabled.
This is useful to have more build coverage and make sure that the driver
is not affected by changes that could cause build regressions.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Wed, 12 Oct 2016 18:55:41 +0000 (15:55 -0300)]
net: wan: slic_ds26522: Export OF module alias information
When the device is registered via OF, the OF table is used to match the
driver instead of the SPI device ID table, but the entries in the later
are used as aliasses to load the module if the driver was not built-in.
This is because the SPI core always reports an SPI module alias instead
of an OF one, but that could change so it's better to always export it.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Martinez Canillas [Wed, 12 Oct 2016 18:55:40 +0000 (15:55 -0300)]
net: wan: slic_ds26522: add SPI device ID table to fix module autoload
If the driver is built as a module, module alias information isn't filled
so the module won't be autoloaded. Add a SPI device ID table and use the
MODULE_DEVICE_TABLE() macro so the information is exported in the module.
Before this patch:
$ modinfo drivers/net/wan/slic_ds26522.ko | grep alias
$
After this patch:
$ modinfo drivers/net/wan/slic_ds26522.ko | grep alias
alias: spi:ds26522
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 12 Oct 2016 08:10:40 +0000 (10:10 +0200)]
ipv6: correctly add local routes when lo goes up
The goal of the patch is to fix this scenario:
ip link add dummy1 type dummy
ip link set dummy1 up
ip link set lo down ; ip link set lo up
After that sequence, the local route to the link layer address of dummy1 is
not there anymore.
When the loopback is set down, all local routes are deleted by
addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
exists, because the corresponding idev has a reference on it. After the rcu
grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
set obsolete to DST_OBSOLETE_DEAD.
In this case, init_loopback() is called before dst_rcu_free(), thus
obsolete is still sets to something <= 0. So, the function doesn't add the
route again. To avoid that race, let's check the rt6 refcnt instead.
Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo")
Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo device up")
Reported-by: Francesco Santoro <francesco.santoro@6wind.com>
Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com>
CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Weilong Chen <chenweilong@huawei.com>
CC: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Fedorenko [Tue, 11 Oct 2016 19:47:20 +0000 (22:47 +0300)]
ip6_tunnel: fix ip6_tnl_lookup
The commit
ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel
endpoints.") introduces support for wildcards in tunnels endpoints,
but in some rare circumstances ip6_tnl_lookup selects wrong tunnel
interface relying only on source or destination address of the packet
and not checking presence of wildcard in tunnels endpoints. Later in
ip6_tnl_rcv this packets can be dicarded because of difference in
ipproto even if fallback device have proper ipproto configuration.
This patch adds checks of wildcard endpoint in tunnel avoiding such
behavior
Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Signed-off-by: Vadim Fedorenko <junk@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 14 Oct 2016 04:40:23 +0000 (21:40 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix various build warnings in tlan/qed/xen-netback drivers, from
Arnd Bergmann.
2) Propagate proper error code in strparser's strp_recv(), from Geert
Uytterhoeven.
3) Fix accidental broadcast of RTM_GETTFILTER responses, from Eric
Dumazret.
4) Need to use list_for_each_entry_safe() in qed driver, from Wei
Yongjun.
5) Openvswitch 802.1AD bug fixes from Jiri Benc.
6) Cure BUILD_BUG_ON() in mlx5 driver, from Tom Herbert.
7) Fix UDP ipv6 checksumming in netvsc driver, from Stephen Hemminger.
8) stmmac driver fixes from Giuseppe CAVALLARO.
9) Fix access to mangled IP6CB in tcp, from Eric Dumazet.
10) Fix info leaks in tipc and rtnetlink, from Dan Carpenter.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
net: bridge: add the multicast_flood flag attribute to brport_attrs
net: axienet: Remove unused parameter from __axienet_device_reset
liquidio: CN23XX: fix a loop timeout
net: rtnl: info leak in rtnl_fill_vfinfo()
tipc: info leak in __tipc_nl_add_udp_addr()
net: ipv4: Do not drop to make_route if oif is l3mdev
net: phy: Trigger state machine on state change and not polling.
ipv6: tcp: restore IP6CB for pktoptions skbs
netvsc: Remove mistaken udp.h inclusion.
xen-netback: fix type mismatch warning
stmmac: fix error check when init ptp
stmmac: fix ptp init for gmac4
qed: fix old-style function definition
netvsc: fix checksum on UDP IPV6
net_sched: reorder pernet ops and act ops registrations
xen-netback: fix guest Rx stall detection (after guest Rx refactor)
drivers/ptp: Fix kernel memory disclosure
net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
qmi_wwan: add support for Quectel EC21 and EC25
openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
...
Linus Torvalds [Fri, 14 Oct 2016 04:28:20 +0000 (21:28 -0700)]
Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Stable bugfixes:
- sunrpc: fix writ espace race causing stalls
- NFS: Fix inode corruption in nfs_prime_dcache()
- NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
- NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
- NFSv4: Open state recovery must account for file permission changes
- NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
Features:
- Add support for tracking multiple layout types with an ordered list
- Add support for using multiple backchannel threads on the client
- Add support for pNFS file layout session trunking
- Delay xprtrdma use of DMA API (for device driver removal)
- Add support for xprtrdma remote invalidation
- Add support for larger xprtrdma inline thresholds
- Use a scatter/gather list for sending xprtrdma RPC calls
- Add support for the CB_NOTIFY_LOCK callback
- Improve hashing sunrpc auth_creds by using both uid and gid
Bugfixes:
- Fix xprtrdma use of DMA API
- Validate filenames before adding to the dcache
- Fix corruption of xdr->nwords in xdr_copy_to_scratch
- Fix setting buffer length in xdr_set_next_buffer()
- Don't deadlock the state manager on the SEQUENCE status flags
- Various delegation and stateid related fixes
- Retry operations if an interrupted slot receives EREMOTEIO
- Make nfs boot time y2038 safe"
* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
fs: nfs: Make nfs boot time y2038 safe
sunrpc: replace generic auth_cred hash with auth-specific function
sunrpc: add RPCSEC_GSS hash_cred() function
sunrpc: add auth_unix hash_cred() function
sunrpc: add generic_auth hash_cred() function
sunrpc: add hash_cred() function to rpc_authops struct
Retry operation on EREMOTEIO on an interrupted slot
pNFS: Fix atime updates on pNFS clients
sunrpc: queue work on system_power_efficient_wq
NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
NFSv4: If recovery failed for a specific open stateid, then don't retry
NFSv4: Fix retry issues with nfs41_test/free_stateid
NFSv4: Open state recovery must account for file permission changes
NFSv4: Mark the lock and open stateids as invalid after freeing them
NFSv4: Don't test open_stateid unless it is set
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
NFSv4: Fix a race when updating an open_stateid
NFSv4: Fix a race in nfs_inode_reclaim_delegation()
...
Linus Torvalds [Fri, 14 Oct 2016 04:04:42 +0000 (21:04 -0700)]
Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:
- Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
COPY is already supported on the client side, so a call to
copy_file_range() on a recent client should now result in a
server-side copy that doesn't require all the data to make a round
trip to the client and back.
- Jeff Layton implemented callbacks to notify clients when contended
locks become available, which should reduce latency on workloads
with contended locks"
* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
NFSD: Implement the COPY call
nfsd: handle EUCLEAN
nfsd: only WARN once on unmapped errors
exportfs: be careful to only return expected errors.
nfsd4: setclientid_confirm with unmatched verifier should fail
nfsd: randomize SETCLIENTID reply to help distinguish servers
nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
nfsd: add a LRU list for blocked locks
nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
nfsd: plumb in a CB_NOTIFY_LOCK operation
NFSD: fix corruption in notifier registration
svcrdma: support Remote Invalidation
svcrdma: Server-side support for rpcrdma_connect_private
rpcrdma: RDMA/CM private message data structure
svcrdma: Skip put_page() when send_reply() fails
svcrdma: Tail iovec leaves an orphaned DMA mapping
nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
nfsd: eliminate cb_minorversion field
nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
Linus Torvalds [Fri, 14 Oct 2016 03:28:22 +0000 (20:28 -0700)]
Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git./linux/kernel/git/dgc/linux-xfs
< XFS has gained super CoW powers! >
----------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Pull XFS support for shared data extents from Dave Chinner:
"This is the second part of the XFS updates for this merge cycle. This
pullreq contains the new shared data extents feature for XFS.
Given the complexity and size of this change I am expecting - like the
addition of reverse mapping last cycle - that there will be some
follow-up bug fixes and cleanups around the -rc3 stage for issues that
I'm sure will show up once the code hits a wider userbase.
What it is:
At the most basic level we are simply adding shared data extents to
XFS - i.e. a single extent on disk can now have multiple owners. To do
this we have to add new on-disk features to both track the shared
extents and the number of times they've been shared. This is done by
the new "refcount" btree that sits in every allocation group. When we
share or unshare an extent, this tree gets updated.
Along with this new tree, the reverse mapping tree needs to be updated
to track each owner or a shared extent. This also needs to be updated
ever share/unshare operation. These interactions at extent allocation
and freeing time have complex ordering and recovery constraints, so
there's a significant amount of new intent-based transaction code to
ensure that operations are performed atomically from both the runtime
and integrity/crash recovery perspectives.
We also need to break sharing when writes hit a shared extent - this
is where the new copy-on-write implementation comes in. We allocate
new storage and copy the original data along with the overwrite data
into the new location. We only do this for data as we don't share
metadata at all - each inode has it's own metadata that tracks the
shared data extents, the extents undergoing CoW and it's own private
extents.
Of course, being XFS, nothing is simple - we use delayed allocation
for CoW similar to how we use it for normal writes. ENOSPC is a
significant issue here - we build on the reservation code added in
4.8-rc1 with the reverse mapping feature to ensure we don't get
spurious ENOSPC issues part way through a CoW operation. These
mechanisms also help minimise fragmentation due to repeated CoW
operations. To further reduce fragmentation overhead, we've also
introduced a CoW extent size hint, which indicates how large a region
we should allocate when we execute a CoW operation.
With all this functionality in place, we can hook up .copy_file_range,
.clone_file_range and .dedupe_file_range and we gain all the
capabilities of reflink and other vfs provided functionality that
enable manipulation to shared extents. We also added a fallocate mode
that explicitly unshares a range of a file, which we implemented as an
explicit CoW of all the shared extents in a file.
As such, it's a huge chunk of new functionality with new on-disk
format features and internal infrastructure. It warns at mount time as
an experimental feature and that it may eat data (as we do with all
new on-disk features until they stabilise). We have not released
userspace suport for it yet - userspace support currently requires
download from Darrick's xfsprogs repo and build from source, so the
access to this feature is really developer/tester only at this point.
Initial userspace support will be released at the same time the kernel
with this code in it is released.
The new code causes 5-6 new failures with xfstests - these aren't
serious functional failures but things the output of tests changing
slightly due to perturbations in layouts, space usage, etc. OTOH,
we've added 150+ new tests to xfstests that specifically exercise this
new functionality so it's got far better test coverage than any
functionality we've previously added to XFS.
Darrick has done a pretty amazing job getting us to this stage, and
special mention also needs to go to Christoph (review, testing,
improvements and bug fixes) and Brian (caught several intricate bugs
during review) for the effort they've also put in.
Summary:
- unshare range (FALLOC_FL_UNSHARE) support for fallocate
- copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
interface
- shared extent support for XFS
- copy-on-write support for shared extents
- copy_file_range support
- clone_file_range support (implements reflink)
- dedupe_file_range support
- defrag support for reverse mapping enabled filesystems"
* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
xfs: convert COW blocks to real blocks before unwritten extent conversion
xfs: rework refcount cow recovery error handling
xfs: clear reflink flag if setting realtime flag
xfs: fix error initialization
xfs: fix label inaccuracies
xfs: remove isize check from unshare operation
xfs: reduce stack usage of _reflink_clear_inode_flag
xfs: check inode reflink flag before calling reflink functions
xfs: implement swapext for rmap filesystems
xfs: refactor swapext code
xfs: various swapext cleanups
xfs: recognize the reflink feature bit
xfs: simulate per-AG reservations being critically low
xfs: don't mix reflink and DAX mode for now
xfs: check for invalid inode reflink flags
xfs: set a default CoW extent size of 32 blocks
xfs: convert unwritten status of reverse mappings for shared files
xfs: use interval query for rmap alloc operations on shared files
xfs: add shared rmap map/unmap/convert log item types
xfs: increase log reservations for reflink
...
Linus Torvalds [Fri, 14 Oct 2016 00:08:58 +0000 (17:08 -0700)]
Merge tag 'pci-v4.9-changes-2' of git://git./linux/kernel/git/helgaas/pci
PCI changes for the v4.9 merge window:
"Here are some more changes I'd like to have in v4.9. There's one
small Tegra bug fix in the PHY poweroff path, which is only used in
failure paths.
The rest is all strictly cleanup that should make host bridge drivers
more readable, but shouldn't actually change any behavior.
Summary:
- use local struct device pointers in many host bridge drivers for
clarity
- remove unused platform data
- use generic DesignWare accessors
- misc cleanups: remove redundant structure entries and re-order
structure members to put comon generic fields first etc"
* tag 'pci-v4.9-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits)
MAINTAINERS: Add maintainer for the PCIe Marvell Armada 8K driver
MAINTAINERS: Add DT binding to the Aardvark PCIe driver maintainer
PCI: rockchip: Indent "if" statement body
PCI: hisi: Reorder struct hisi_pcie
PCI: hisi: Pass device-specific struct to internal functions
PCI: hisi: Include register block base in PCIE_SYS_STATE4 address
PCI: dra7xx: Reorder struct dra7xx_pcie
PCI: xilinx-nwl: Remove unused platform data
PCI: xilinx-nwl: Add local struct device pointers
PCI: xilinx: Removed unused xilinx_pcie_assign_msi() argument
PCI: xilinx: Remove unused platform data
PCI: xilinx: Add local struct device pointers
PCI: xgene: Add register accessors
PCI: xgene: Pass struct xgene_pcie_port to setup functions
PCI: xgene: Remove unused platform data
PCI: tegra: Remove unused platform data
PCI: tegra: Add local struct device pointers
PCI: tegra: Fix argument order in tegra_pcie_phy_disable()
PCI: rockchip: Remove unused platform data
PCI: rcar-gen2: Add local struct device pointers
...