Andrew Lunn [Fri, 11 Mar 2016 23:01:40 +0000 (00:01 +0100)]
phy: fixed: Fix removal of phys.
The fixed phys delete function simply removed the fixed phy from the
internal linked list and freed the memory. It however did not
unregister the associated phy device. This meant it was still possible
to find the phy device on the mdio bus.
Make fixed_phy_del() an internal function and add a
fixed_phy_unregister() to unregisters the phy device and then uses
fixed_phy_del() to free resources.
Modify DSA to use this new API function, so we don't leak phys.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 11 Mar 2016 23:01:39 +0000 (00:01 +0100)]
dsa: dsa: Fix freeing of fixed-phys from user ports.
All ports types can have a fixed PHY associated with it. Remove the
check which limits removal to only CPU and DSA ports.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 11 Mar 2016 23:01:38 +0000 (00:01 +0100)]
dsa: Destroy fixed link phys after the phy has been disconnected
The phy is disconnected from the slave in dsa_slave_destroy(). Don't
destroy fixed link phys until after this, since there can be fixed
linked phys connected to ports.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 11 Mar 2016 23:01:37 +0000 (00:01 +0100)]
dsa: slave: Don't reference NULL pointer during phy_disconnect
When the phy is disconnected, the parent pointer to the netdev it was
attached to is set to NULL. The code then tries to suspend the phy,
but dsa_slave_fixed_link_update needs the parent pointer to determine
which switch the phy is connected to. So it dereferenced a NULL
pointer. Check for this condition.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 11 Mar 2016 23:01:36 +0000 (00:01 +0100)]
dsa: Rename mv88e6123_61_65 to mv88e6123 to be consistent
All the drivers support multiple chips, but mv88e6123_61_65 is the
only one that reflects this in its naming. Change it to be consistent
with the other drivers.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 19:31:59 +0000 (15:31 -0400)]
Merge branch 'of_mdio-checks'
Sergei Shtylyov says:
====================
of_mdio: use IS_ERR_OR_NULL() and PTR_ERR_OR_ZERO()
Here's the set of 3 patches against DaveM's 'net-next.git' repo. They deal
with some error checks in the device tree MDIO code...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 12 Mar 2016 21:34:58 +0000 (00:34 +0300)]
of_mdio: use PTR_ERR_OR_ZERO()
PTR_ERR_OR_ZERO() is open coded in of_phy_register_fixed_link(), so just
call it directly...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 12 Mar 2016 21:34:02 +0000 (00:34 +0300)]
of_mdio: use IS_ERR_OR_NULL()
IS_ERR_OR_NULL() is open coded in of_mdiobus_register_phy(), so just call
it directly...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 12 Mar 2016 21:33:13 +0000 (00:33 +0300)]
of_mdio: mdio_device_create() never returns NULL
mdio_device_create() never returns NULL, thus checking for it in
of_mdiobus_register_device() is pointless...
Suggested-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 19:27:23 +0000 (15:27 -0400)]
Merge branch 'thunderx-phy'
David Daney says:
====================
net/phy: Improvements to Cavium Thunder MDIO code.
Changes from v1:
- In 1/3 Add back check for non-OF objects in bgx_init_of_phy(). It
is probably not necessary, but better safe than sorry...
The firmware on many Cavium Thunder systems configures the MDIO bus
hardware to be probed as a PCI device. In order to use the MDIO bus
drivers in this configuration, we must add PCI probing to the driver.
There are two parts to this set of three patches:
1) Cleanup the PHY probing code in thunder_bgx.c to handle the case
where there is no PHY attached to a port, as well as being more
robust in the face of driver loading order by use of
-EPROBE_DEFER.
2) Split mdio-octeon.c into two drivers, one with platform probing,
and the other with PCI probing. Common code is shared between the
two.
Tested on several different Thunder and OCTEON systems, also compile
tested on x86_64.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Fri, 11 Mar 2016 17:53:11 +0000 (09:53 -0800)]
phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.
The Cavium Thunder SoCs have multiple MIDO buses that are part of a
single PCI device. To model this in the device tree we call the PCI
parent device a "cavium,thunder-8890-mdio-nexus", it has several
children, one for each MDIO bus.
The MDIO bus hardware is identical to that found in the OCTEON SoCs,
so we use that code for things that are not part of the PCI driver
probe/remove
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Fri, 11 Mar 2016 17:53:10 +0000 (09:53 -0800)]
phy: mdio-octeon: Refactor into two files/modules
A follow-on patch uses PCI probing to find the Thunder MDIO hardware.
In preparation for this, split out the common code into a new file
mdio-cavium.c, which will be used by both the existing OCTEON driver,
and the new Thunder PCI based driver.
As part of the refactoring simplify the struct cavium_mdiobus by
removing fields that are only ever used in the probe function and can
just as well be local variables.
Use readq/writeq in preference to readq_relaxed/writeq_relaxed as the
relaxed form was an optimization for an early chip revision, and the
MDIO drivers are not performance bottlenecks that need optimization in
the first place.
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Fri, 11 Mar 2016 17:53:09 +0000 (09:53 -0800)]
net: thunderx: Cleanup PHY probing code.
Remove the call to force the octeon-mdio driver to be loaded. Allow
the standard driver loading mechanisms to load the PHY drivers, and
use -EPROBE_DEFER to cause the BGX driver to be probed only after the
PHY drivers are available.
Reorder the setting of MAC addresses and PHY probing to allow BGX
LMACs with no attached PHY to still be assigned a MAC address.
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anna-Maria Gleixner [Fri, 11 Mar 2016 09:10:23 +0000 (10:10 +0100)]
net: mvneta: Add missing hotplug notifier transition
The mvneta_percpu_notifier() hotplug callback lacks handling of the
CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the
driver is not well configured on the CPU.
Add handling for CPU_DOWN_FAILED[_FROZEN] hotplug notifier transition
to setup the driver.
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igal Liberman [Sun, 13 Mar 2016 19:14:43 +0000 (21:14 +0200)]
fsl/fman: fix dtsec_set_tx_pause_frames
Fix a bug introduced in
e06a03b (fsl/fman: fix the pause_time test)
When pause_time is set to '0' - pause frames are disabled and
there's no need to apply dTSEC-A003 Errata workaround.
Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 14 Mar 2016 17:55:50 +0000 (10:55 -0700)]
Documentation: networking: phy.txt: Add missing functions
Some new development in PHYLIB added new function pointers to the struct
phy_driver, document these.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Mon, 14 Mar 2016 17:52:15 +0000 (10:52 -0700)]
tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
Per RFC4898, they count segments sent/received
containing a positive length data segment (that includes
retransmission segments carrying data). Unlike
tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments
carrying no data (e.g. pure ack).
The patch also updates the segs_in in tcp_fastopen_add_skb()
so that segs_in >= data_segs_in property is kept.
Together with retransmission data, tcpi_data_segs_out
gives a better signal on the rxmit rate.
v6: Rebase on the latest net-next
v5: Eric pointed out that checking skb->len is still needed in
tcp_fastopen_add_skb() because skb can carry a FIN without data.
Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in()
helper is used. Comment is added to the fastopen case to explain why
segs_in has to be reset and tcp_segs_in() has to be called before
__skb_pull().
v4: Add comment to the changes in tcp_fastopen_add_skb()
and also add remark on this case in the commit message.
v3: Add const modifier to the skb parameter in tcp_segs_in()
v2: Rework based on recent fix by Eric:
commit
a9d99ce28ed3 ("tcp: fix tcpi_segs_in after connection establishment")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:53:57 +0000 (15:53 +0100)]
vmxnet3: fix lock imbalance in vmxnet3_tq_xmit()
A recent bug fix rearranged the code in vmxnet3_tq_xmit() in a
way that left the error handling for oversized headers unlock
a lock that had not been taken yet. Gcc warns about the incorrect
use of the 'flags' variable because of that:
drivers/net/vmxnet3/vmxnet3_drv.c: In function 'vmxnet3_tq_xmit.constprop':
include/linux/spinlock.h:246:3: error: 'flags' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This changes the error handling path to 'goto' the end of the function
beyond the lock/unlock pair.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: cec05562fb1d ("vmxnet3: avoid calling pskb_may_pull with interrupts disabled")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 17:09:50 +0000 (13:09 -0400)]
Merge branch 'net-gcc60-fixes'
Arnd Bergmann says:
====================
net: gcc-6.0 warning fixes
I've just installed gcc-6.0 to see what kinds of new warnings
we get. It turns out that it's actually really useful once I
disabled -Wunused-const-variable, and all of the warnings it
found in network drivers seem valid.
Sorry for the bad timing in the merge window, but I figured
it would be better to send the fixes as I found the bugs
rather than waiting for the next cycle. The first three
look appropriate for stable backports.
The other two only fix a gcc warning about incorrect whitespace,
probably not worth backporting those.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:18:38 +0000 (15:18 +0100)]
net: caif: fix misleading indentation
gcc points out code that is not indented the way it is
interpreted:
net/caif/cfpkt_skbuff.c: In function 'cfpkt_setlen':
net/caif/cfpkt_skbuff.c:289:4: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
return cfpkt_getlen(pkt);
^~~~~~
net/caif/cfpkt_skbuff.c:286:3: note: ...this 'else' clause, but it is not
else
^~~~
It is clear from the context that not returning here would be
a bug, as we'd end up passing a negative length into a function
that takes a u16 length, so it is not missing curly braces
here, and I'm assuming that the indentation is the only part
that's wrong about it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:18:37 +0000 (15:18 +0100)]
ath9k: fix misleading indentation
A cleanup patch in linux-3.18 moved around some code in the ath9k
driver and left some code to be indented in a misleading way,
made worse by the addition of some new code for p2p mode, as
discovered by a new gcc-6 warning:
drivers/net/wireless/ath/ath9k/init.c: In function 'ath9k_set_hw_capab':
drivers/net/wireless/ath/ath9k/init.c:851:4: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation]
hw->wiphy->iface_combinations = if_comb;
^~
drivers/net/wireless/ath/ath9k/init.c:847:3: note: ...this 'if' clause, but it is not
if (ath9k_is_chanctx_enabled())
^~
The code is in fact correct, but the indentation is not, so I'm
reformatting it as it should have been after the original cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 499afaccf6f3 ("ath9k: Isolate ath9k_use_chanctx module parameter")
Fixes: eb61f9f623f7 ("ath9k: advertise p2p dev support when chanctx")
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:18:36 +0000 (15:18 +0100)]
ath9k: fix buffer overrun for ar9287
Code that was added back in 2.6.38 has an obvious overflow
when accessing a static array, and at the time it was added
only a code comment was put in front of it as a reminder
to have it reviewed properly.
This has not happened, but gcc-6 now points to the specific
overflow:
drivers/net/wireless/ath/ath9k/eeprom.c: In function 'ath9k_hw_get_gain_boundaries_pdadcs':
drivers/net/wireless/ath/ath9k/eeprom.c:483:44: error: array subscript is above array bounds [-Werror=array-bounds]
maxPwrT4[i] = data_9287[idxL].pwrPdg[i][4];
~~~~~~~~~~~~~~~~~~~~~~~~~^~~
It turns out that the correct array length exists in the local
'intercepts' variable of this function, so we can just use that
instead of hardcoding '4', so this patch changes all three
instances to use that variable. The other two instances were
already correct, but it's more consistent this way.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 940cd2c12ebf ("ath9k_hw: merge the ar9287 version of ath9k_hw_get_gain_boundaries_pdadcs")
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:18:35 +0000 (15:18 +0100)]
farsync: fix off-by-one bug in fst_add_one
gcc-6 finds an out of bounds access in the fst_add_one function
when calculating the end of the mmio area:
drivers/net/wan/farsync.c: In function 'fst_add_one':
drivers/net/wan/farsync.c:418:53: error: index 2 denotes an offset greater than size of 'u8[2][8192] {aka unsigned char[2][8192]}' [-Werror=array-bounds]
#define BUF_OFFSET(X) (BFM_BASE + offsetof(struct buf_window, X))
^
include/linux/compiler-gcc.h:158:21: note: in definition of macro '__compiler_offsetof'
__builtin_offsetof(a, b)
^
drivers/net/wan/farsync.c:418:37: note: in expansion of macro 'offsetof'
#define BUF_OFFSET(X) (BFM_BASE + offsetof(struct buf_window, X))
^~~~~~~~
drivers/net/wan/farsync.c:2519:36: note: in expansion of macro 'BUF_OFFSET'
+ BUF_OFFSET ( txBuffer[i][NUM_TX_BUFFER][0]);
^~~~~~~~~~
The warning is correct, but not critical because this appears
to be a write-only variable that is set by each WAN driver but
never accessed afterwards.
I'm taking the minimal fix here, using the correct pointer by
pointing 'mem_end' to the last byte inside of the register area
as all other WAN drivers do, rather than the first byte outside of
it. An alternative would be to just remove the mem_end member
entirely.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:18:34 +0000 (15:18 +0100)]
mlx4: add missing braces in verify_qp_parameters
The implementation of QP paravirtualization back in linux-3.7 included
some code that looks very dubious, and gcc-6 has grown smart enough
to warn about it:
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'verify_qp_parameters':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3154:5: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
if (optpar & MLX4_QP_OPTPAR_ALT_ADDR_PATH) {
^~
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3144:4: note: ...this 'if' clause, but it is not
if (slave != mlx4_master_func_num(dev))
>From looking at the context, I'm reasonably sure that the indentation
is correct but that it should have contained curly braces from the
start, as the update_gid() function in the same patch correctly does.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 54679e148287 ("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop")
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:07:12 +0000 (15:07 +0100)]
net: mediatek: check device_reset return code
The device_reset() function may fail, so we have to check
its return value, e.g. to make deferred probing work correctly.
gcc warns about it because of the warn_unused_result attribute:
drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function 'mtk_probe':
drivers/net/ethernet/mediatek/mtk_eth_soc.c:1679:2: error: ignoring return value of 'device_reset', declared with attribute warn_unused_result [-Werror=unused-result]
This adds the trivial error check to propagate the return value
to the generic platform device probe code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:07:11 +0000 (15:07 +0100)]
net: mediatek: remove incorrect dma_mask assignment
Device drivers should not mess with the DMA mask directly,
but instead call dma_set_mask() etc if needed.
In case of the mtk_eth_soc driver, the mask already gets set
correctly when the device is created, and setting it again
is against the documented API.
This removes the incorrect setting.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 14 Mar 2016 14:07:10 +0000 (15:07 +0100)]
net: mediatek: use dma_addr_t correctly
dma_alloc_coherent() expects a dma_addr_t pointer as its argument,
not an 'unsigned int', and gcc correctly warns about broken
code in the mtk_init_fq_dma function:
drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function 'mtk_init_fq_dma':
drivers/net/ethernet/mediatek/mtk_eth_soc.c:463:13: error: passing argument 3 of 'dma_alloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]
This changes the type of the local variable to dma_addr_t.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Mon, 14 Mar 2016 12:56:35 +0000 (09:56 -0300)]
net: Fix use after free in the recvmmsg exit path
The syzkaller fuzzer hit the following use-after-free:
Call Trace:
[<
ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
[<
ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
[< inline >] SYSC_recvmmsg net/socket.c:2281
[<
ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
[<
ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/
20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 16:33:37 +0000 (12:33 -0400)]
Merge branch 'thunderx-perf'
Sunil Goutham says:
====================
net: thunderx: Performance enhancement changes
Below patches attempts to improve performance by reducing
no of atomic operations while allocating new receive buffers
and reducing cache misses by adjusting nicvf structure elements.
Changes from v1:
No changes, resubmitting a fresh as per David's suggestion.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 14 Mar 2016 11:06:15 +0000 (16:36 +0530)]
net: thunderx: Adjust nicvf structure to reduce cache misses
Adjusted nicvf structure such that all elements used in hot
path like napi, xmit e.t.c fall into same cache line. This reduced
no of cache misses and resulted in ~2% increase in no of packets
handled on a core.
Also modified elements with :1 notation to boolean, to be
consistent with other element definitions.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 14 Mar 2016 11:06:14 +0000 (16:36 +0530)]
net: thunderx: Set recevie buffer page usage count in bulk
Instead of calling get_page() for every receive buffer carved out
of page, set page's usage count at the end, to reduce no of atomic
calls.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Mon, 14 Mar 2016 08:43:52 +0000 (09:43 +0100)]
tipc: make sure IPv6 header fits in skb headroom
Expand headroom further in order to be able to fit the larger IPv6
header. Prior to this patch this caused a skb under panic for certain
tipc packets when using IPv6 UDP bearer(s).
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 16:19:47 +0000 (12:19 -0400)]
Merge branch 'mvneta-hwbm'
Gregory CLEMENT says:
====================
API set for HW Buffer management
This is the sixth version of the API set for HW Buffer management (that was
initially submitted here:
http://thread.gmane.org/gmane.linux.kernel/
2125152).
This version is just a rebasing onto the last net-next. I also added
the Tested-by flag from Sebastian Careba : "The patch set applies
successfully and it works well, no more Samba issues any longer".
For the record in the previous versions I made the following changes:
v4 -> v5:
- Add a field with the size of the buffer of the pool was added. It
then allow to fix some misused size in the mvneta_bm code when using
the new framework.
- Add a new patch from Marcin for sram allowing to require
non-bufferable access to the memory. It was needed for the hardware
buffer management of the mvneta.
- Fix the build issue notified by the 0-day builder when building the
drivers as module.
v3 -> v4
- Fix build issue when HWBM is not selected
v2 -> v3
- Make a HWBM and a SWBM version of the mvneta_rx() function in order
to reduce the the conditional code. Kept a condition inside the
mvneta_poll because specializing this function would have means
duplicating 95% of the code.
- Put back the register_netdev() call at the end of the mvneta_probe()
function. In order to have a unique ID for each port, just used a
global variable in the driver.
- Added a fix from Marcin in the "net: mvneta: bm: add support for
hardware buffer management" patch: "when dropping packets, only
buffer pointers passed from BM to descriptors have to be returned to
the pool. In submitted version after closing the port and
mvneta_rxq_deinit(), it was very likely that a lot of fake buffers
are added to the pool, because all descriptors took part in
iteration."
- Removed the select MVNETA_BM from the Kconfig, it will let the user
the choice to use not use it if they want.
v1 -> v2
- The hardware buffer management helpers are no more built by default
and now depend on a hidden config symbol which has to be selected
by the driver if needed
- The hwbm_pool_refill() and hwbm_pool_add() now receive a gfp_t as
argument allowing the caller to specify the flag it needs.
- buf_num is now tested to ensure there is no wrapping
- A spinlock has been added to protect the hwbm_pool_add() function in
SMP or irq context.
- used pr_warn instead of pr_debug in case of errors.
- fixed the mvneta implementation by returning the buffer to the pool
at various place instead of ignoring it.
- Squashed "bus: mvenus-mbus: Fix size test for
mvebu_mbus_get_dram_win_info" into bus: mvebu-mbus: provide api for
obtaining IO and DRAM window information.
- Added my signed-otf-by on all the patches as submitter of the series.
- Renamed the dts patches with the pattern "ARM: dts: platform:"
- Removed the patch "ARM: mvebu: enable SRAM support in
mvebu_v7_defconfig" of this series and already applied it
- Modified the order of the patches.
In order to ease the test the branch mvneta-BM-framework-v6 is
available at git@github.com:MISL-EBU-System-SW/mainline-public.git.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Mon, 14 Mar 2016 08:39:05 +0000 (09:39 +0100)]
net: mvneta: Use the new hwbm framework
Now that the hardware buffer management framework had been introduced,
let's use it.
Tested-by: Sebastian Careba <nitroshift@yahoo.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Mon, 14 Mar 2016 08:39:04 +0000 (09:39 +0100)]
net: add a hardware buffer management helper API
This basic implementation allows to share code between driver using
hardware buffer management. As the code is hardware agnostic, there is
few helpers, most of the optimization brought by the an HW BM has to be
done at driver level.
Tested-by: Sebastian Careba <nitroshift@yahoo.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 14 Mar 2016 08:39:03 +0000 (09:39 +0100)]
net: mvneta: bm: add support for hardware buffer management
Buffer manager (BM) is a dedicated hardware unit that can be used by all
ethernet ports of Armada XP and 38x SoC's. It allows to offload CPU on RX
path by sparing DRAM access on refilling buffer pool, hardware-based
filling of descriptor ring data and better memory utilization due to HW
arbitration for using 'short' pools for small packets.
Tests performed with A388 SoC working as a network bridge between two
packet generators showed increase of maximum processed 64B packets by
~20k (~555k packets with BM enabled vs ~535 packets without BM). Also
when pushing 1500B-packets with a line rate achieved, CPU load decreased
from around 25% without BM to 20% with BM.
BM comprise up to 4 buffer pointers' (BP) rings kept in DRAM, which
are called external BP pools - BPPE. Allocating and releasing buffer
pointers (BP) to/from BPPE is performed indirectly by write/read access
to a dedicated internal SRAM, where internal BP pools (BPPI) are placed.
BM hardware controls status of BPPE automatically, as well as assigning
proper buffers to RX descriptors. For more details please refer to
Functional Specification of Armada XP or 38x SoC.
In order to enable support for a separate hardware block, common for all
ports, a new driver has to be implemented ('mvneta_bm'). It provides
initialization sequence of address space, clocks, registers, SRAM,
empty pools' structures and also obtaining optional configuration
from DT (please refer to device tree binding documentation). mvneta_bm
exposes also a necessary API to mvneta driver, as well as a dedicated
structure with BM information (bm_priv), whose presence is used as a
flag notifying of BM usage by port. It has to be ensured that mvneta_bm
probe is executed prior to the ones in ports' driver. In case BM is not
used or its probe fails, mvneta falls back to use software buffer
management.
A sequence executed in mvneta_probe function is modified in order to have
an access to needed resources before possible port's BM initialization is
done. According to port-pools mapping provided by DT appropriate registers
are configured and the buffer pools are filled. RX path is modified
accordingly. Becaues the hardware allows a wide variety of configuration
options, following assumptions are made:
* using BM mechanisms can be selectively disabled/enabled basing
on DT configuration among the ports
* 'long' pool's single buffer size is tied to port's MTU
* using 'long' pool by port is obligatory and it cannot be shared
* using 'short' pool for smaller packets is optional
* one 'short' pool can be shared among all ports
This commit enables hardware buffer management operation cooperating with
existing mvneta driver. New device tree binding documentation is added and
the one of mvneta is updated accordingly.
[gregory.clement@free-electrons.com: removed the suspend/resume part]
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 14 Mar 2016 08:39:02 +0000 (09:39 +0100)]
bus: mvebu-mbus: provide api for obtaining IO and DRAM window information
This commit enables finding appropriate mbus window and obtaining its
target id and attribute for given physical address in two separate
routines, both for IO and DRAM windows. This functionality
is needed for Armada XP/38x Network Controller's Buffer Manager and
PnC configuration.
[gregory.clement@free-electrons.com: Fix size test for
mvebu_mbus_get_dram_win_info]
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
[DRAM window information reference in LKv3.10]
Signed-off-by: Evan Wang <xswang@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Mon, 14 Mar 2016 08:39:01 +0000 (09:39 +0100)]
ARM: dts: armada-xp-openblocks-ax3-4: Add BM support
Allow Openblock AX3 using hardware buffer management with mvneta.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 14 Mar 2016 08:39:00 +0000 (09:39 +0100)]
ARM: dts: armada-xp: enable buffer manager support on Armada XP boards
Since mvneta driver supports using hardware buffer management (BM), in
order to use it, board files have to be adjusted accordingly. This commit
enables BM on AXP-DB and AXP-GP in same manner - because number of ports
on those boards is the same as number of possible pools, each port is
supposed to use single pool for all kind of packets.
Moreover appropriate entry is added to 'soc' node ranges, as well as "okay"
status for 'bm' and 'bm-bppi' (internal SRAM) nodes.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 14 Mar 2016 08:38:59 +0000 (09:38 +0100)]
ARM: dts: armada-xp: add buffer manager nodes
Armada XP network controller supports hardware buffer management (BM).
Since it is now enabled in mvneta driver, appropriate nodes can be added
to armada-xp.dtsi - for the actual common BM unit (bm@c0000) and its
internal SRAM (bm-bppi), which is used for indirect access to buffer
pointer ring residing in DRAM.
Pools - ports mapping, bm-bppi entry in 'soc' node's ranges and optional
parameters are supposed to be set in board files.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 14 Mar 2016 08:38:58 +0000 (09:38 +0100)]
ARM: dts: armada-38x: enable buffer manager support on Armada 38x boards
Since mvneta driver supports using hardware buffer management (BM), in
order to use it, board files have to be adjusted accordingly. This commit
enables BM on:
* A385-DB-AP - each port has its own pool for long and common pool for
short packets,
* A388-ClearFog - same as above,
* A388-DB - to each port unique 'short' and 'long' pools are mapped,
* A388-GP - same as above.
Moreover appropriate entry is added to 'soc' node ranges, as well as "okay"
status for 'bm' and 'bm-bppi' (internal SRAM) nodes.
[gregory.clement@free-electrons.com: add suppport for the ClearFog board]
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 14 Mar 2016 08:38:57 +0000 (09:38 +0100)]
ARM: dts: armada-38x: add buffer manager nodes
Armada 38x network controller supports hardware buffer management (BM).
Since it is now enabled in mvneta driver, appropriate nodes can be added
to armada-38x.dtsi - for the actual common BM unit (bm@c8000) and its
internal SRAM (bm-bppi), which is used for indirect access to buffer
pointer ring residing in DRAM.
Pools - ports mapping, bm-bppi entry in 'soc' node's ranges and optional
parameters are supposed to be set in board files.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 14 Mar 2016 08:38:56 +0000 (09:38 +0100)]
misc: sram: add optional ioremap without write combining
Some SRAM users may require non-bufferable access to the memory, which is
impossible, because devm_ioremap_wc() is used for setting sram->virt_base.
This commit adds optional flag 'no-memory-wc', which allow to choose remap
method, using DT property. Documentation is updated accordingly.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 16:13:23 +0000 (12:13 -0400)]
Merge tag 'wireless-drivers-next-for-davem-2016-03-14' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers patches for 4.6
Major changes:
rtl8xxxu
* add 8723bu support
wl18xx
* add radar_debug_mode debugfs file for DFS testing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 03:55:14 +0000 (23:55 -0400)]
Merge branch 'ipv4-ipv6-csums'
Alexander Duyck says:
====================
Fix differences between IPv4 and IPv6 TCP/UDP checksum calculation
This patch series is meant to address the differences that exist between
IPv4 and IPv6 in terms of checksum calculation. Specifically the IPv6
function csum_ipv6_magic treated length as a value that could be greater
than 64K, while csum_tcpudp_magic was truncating the length at 16 bits.
After looking over the code and giving it some thought I decided it would
be best to update the IPv4 function so that it worked the same way the IPv6
one did. This allows us to get the same results given the same inputs for
both functions. As a result we can use the same processes to reverse the
calculation in the event we need to do something like remove the length of
the pseudo-header checksum.
I also took the opportunity to standardize things so that the parameters
for these functions all use the correct types. IPv4 addresses are __be32,
length should always be __u32, and protocol is a __u8.
With this change in place it corrects an issue with UDP tunnels in which we
were getting a checksum that was off by 1 when performing fragmentation on
inner UDP packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 11 Mar 2016 22:05:47 +0000 (14:05 -0800)]
GSO/UDP: Use skb->len instead of udph->len to determine length of original skb
It is possible for tunnels to end up generating IP or IPv6 datagrams that
are larger than 64K and expecting to be segmented. As such we need to deal
with length values greater than 64K. In order to accommodate this we need
to update the code to work with a 32b length value instead of a 16b one.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 11 Mar 2016 22:05:41 +0000 (14:05 -0800)]
ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short
This patch updates csum_ipv6_magic so that it correctly recognizes that
protocol is a unsigned 8 bit value.
This will allow us to better understand what limitations may or may not be
present in how we handle the data. For example there are a number of
places that call htonl on the protocol value. This is likely not necessary
and can be replaced with a multiplication by ntohl(1) which will be
converted to a shift by the compiler.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 11 Mar 2016 22:05:34 +0000 (14:05 -0800)]
ipv4: Update parameters for csum_tcpudp_magic to their original types
This patch updates all instances of csum_tcpudp_magic and
csum_tcpudp_nofold to reflect the types that are usually used as the source
inputs. For example the protocol field is populated based on nexthdr which
is actually an unsigned 8 bit value. The length is usually populated based
on skb->len which is an unsigned integer.
This addresses an issue in which the IPv6 function csum_ipv6_magic was
generating a checksum using the full 32b of skb->len while
csum_tcpudp_magic was only using the lower 16 bits. As a result we could
run into issues when attempting to adjust the checksum as there was no
protocol agnostic way to update it.
With this change the value is still truncated as many architectures use
"(len + proto) << 8", however this truncation only occurs for values
greater than
16776960 in length and as such is unlikely to occur as we stop
the inner headers at ~64K in size.
I did have to make a few minor changes in the arm, mn10300, nios2, and
score versions of the function in order to support these changes as they
were either using things such as an OR to combine the protocol and length,
or were using ntohs to convert the length which would have truncated the
value.
I also updated a few spots in terms of whitespace and type differences for
the addresses. Most of this was just to make sure all of the definitions
were in sync going forward.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 03:28:00 +0000 (23:28 -0400)]
ipv4: Don't do expensive useless work during inetdev destroy.
When an inetdev is destroyed, every address assigned to the interface
is removed. And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:
1) Address promotion. We are deleting all addresses, so there is no
point in doing this.
2) A full nf conntrack table purge for every address. We only need to
do this once, as is already caught by the existing
masq_dev_notifier so masq_inet_event() can skip this.
Reported-by: Solar Designer <solar@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
David S. Miller [Mon, 14 Mar 2016 02:43:01 +0000 (22:43 -0400)]
Merge tag 'nfc-next-4.6-1' of git://git./linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.6 pull request
This is a very small one this time, with only 5 patches.
There are a couple of big items that could not be merged/finished
on time.
We have:
- 2 LLCP fixes for a race and a potential OOM.
- 2 cleanups for the pn544 and microread drivers.
- 1 Maintainer addition for the s3fwrn5 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 02:40:24 +0000 (22:40 -0400)]
Merge branch 'macsec'
Sabrina Dubroca says:
====================
MACsec IEEE 802.1AE implementation
MACsec (IEEE 802.1AE [0]) is a protocol that provides security for
wired ethernet LANs. MACsec offers two protection modes:
authentication only, or authenticated encryption.
MACsec defines "secure channels" that allow transmission from one node
to one or more others. Communication on a channel is done over a
succession of "secure associations", that each use a specific key.
Secure associations are identified by their "association number" in
the range 0..3. A secure association is retired when its 32-bit
packet number would wrap, and the same association number can later be
reused with a new key and packet number.
The standard mode of encryption is GCM AES with 128 bits keys,
although an extension allows 256 bits keys [1] (not implemented in
this submission).
When using MACsec, an extra header, called "SecTAG", is added between
the ethernet header and the original payload:
+---------------------------------+----------------+----------------+
| (MACsec ethertype) | TCI_AN | SL |
+---------------------------------+----------------+----------------+
| Packet Number |
+-------------------------------------------------------------------+
| Secure Channel Identifier |
| (optional) |
+-------------------------------------------------------------------+
TCI_AN:
version
end_station
sci_present
scb
encrypted
changed_text
association_number (2 bits)
SL:
short_length (6 bits)
unused (2 bits)
The ethertype for the packet is set to 0x88E5, and the original
ethertype becomes part of the secure payload, which may be encrypted.
The ethernet header and the SecTAG are always transmitted in the
clear, but are integrity-protected.
MACsec supports optional replay protection with a configurable replay
window.
MACsec is designed to be used with the MKA extension to 802.1X (MACsec
Key Agreement protocol) [2], which provides channel attribution and
key distribution to the nodes, but can also be used with static keys
getting fed manually by an administrator.
Optional (not supported yet) features:
- confidentiality offset: in encryption mode, part of the payload may
be left unencrypted.
- choice of cipher suite: GCM AES with 256 bits has been standardised
[1].
Implementation
A netdevice is created on top of a real device for each TX secure
channel, like we do for VLANs. Multiple TX channels can be created on
top of the same underlying device.
Several other approaches were considered for the RX path:
- dev_add_pack: doesn't work, because we want to filter out
unprotected packets
- transparent mode: MACsec would be enabled directly on the real
netdevice. For this, we cannot use a rx_handler directly because
MACsec must be available for underlying devices enslaved in a
bridge or in a bond, so we need a hook directly in
__netif_receive_skb_core. This approach makes it harder to filter
non-encrypted packets on RX without forcing the user to setup some
rules, so the "transparent" mode is not so transparent after all.
It also makes TX more complex than with a dedicated netdevice.
One issue with the proposed implementation is that the qdisc layer for
the real device operates on already encrypted packets.
Netlink API
This is currently a mix of rtnetlink (to create the device and set up
the TX channel) and genl (for RX channels, secure associations and
their keys). genl provides clean demultiplexing of the {TX,RX}{SC,SA}
commands.
Use cases
The normal use case is wired LANs, including veth and slave devices
for bonding/teaming or bridges.
MACsec can also be used on any device that makes a full ethernet
header visible, for example VXLAN.
The VXLAN+MACsec setup would be:
hypervisor | virtual machine
<real_dev>---<VXLAN>---|---<dev>---<macsec_dev>
And the packets would look like this:
| eth | IP | UDP | VXLAN | eth | MACsec | IP | ... | MACsec ICV |
One benefit on this approach to encryption in the cloud is that the
payload is encrypted by the tenant, not by the tunnel provider, thus
the tenant has full control over the keys.
Changes from v1:
- rework netlink API after discussion with Johannes Berg
- nest attributes, rename
- export stats as separate attributes
- add some comments
- misc small fixes (rcu, constants, struct organization)
Changes from RFCv2:
- fix ENCODING_SA param validation
- add parent link to netlink ifdumps
Changes from RFCv1:
- addressed comments from Florian and Paolo + kbuild robot
- also perform post-decrypt handling after crypto callback
- fixed ->dellink behavior
Future plans:
- offload to hardware, on nics that support it
- implement optional features
[0] http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf
[1] http://standards.ieee.org/getieee802/download/802.1AEbn-2011.pdf
[2] http://standards.ieee.org/getieee802/download/802.1X-2010.pdf
[3] RFCv1: http://www.spinics.net/lists/netdev/msg358151.html
[4] RFCv2: http://www.spinics.net/lists/netdev/msg362389.html
[5] v1: http://www.spinics.net/lists/netdev/msg367959.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:33 +0000 (18:07 +0100)]
macsec: introduce IEEE 802.1AE driver
This is an implementation of MACsec/IEEE 802.1AE. This driver
provides authentication and encryption of traffic in a LAN, typically
with GCM-AES-128, and optional replay protection.
http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:32 +0000 (18:07 +0100)]
net: add MACsec netdevice priv_flags and helper
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:31 +0000 (18:07 +0100)]
uapi: add MACsec bits
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
liping.zhang [Fri, 11 Mar 2016 15:08:36 +0000 (23:08 +0800)]
net: socket: use pr_info_once to tip the obsolete usage of PF_PACKET
There is no need to use the static variable here, pr_info_once is more
concise.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zefir Kurtisi [Fri, 11 Mar 2016 14:31:53 +0000 (15:31 +0100)]
at803x: fix suspend/resume for SGMII link
When operating the at803x in SGMII mode, resuming the chip
from power down brings up the copper-side link but leaves
the SGMII link in unconnected state (tested with at8031
attached to gianfar). In effect, this caused a permanent
link loss once the related interface was put down.
This patch ensures that power down handling in supspend()
and resume() is also applied to the SGMII link.
Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 02:35:36 +0000 (22:35 -0400)]
Merge branch 'net-more-bulk-free-users'
Jesper Dangaard Brouer says:
====================
net: bulk free adjustment and two driver use-cases
I've split out the bulk free adjustments, from the bulk alloc patches,
as I want the adjustment to napi_consume_skb be in same kernel cycle
the API was introduced.
Adjustments based on discussion:
Subj: "mlx4: use napi_consume_skb API to get bulk free operations"
http://thread.gmane.org/gmane.linux.network/402503/focus=403386
Patchset based on net-next at commit
3ebeac1d0295
V4: more nitpicks from Sergei
V3: spelling fixes from Sergei
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Fri, 11 Mar 2016 08:44:17 +0000 (09:44 +0100)]
mlx5: use napi_consume_skb API to get bulk free operations
Bulk free of SKBs happen transparently by the API call napi_consume_skb().
The napi budget parameter is needed by napi_consume_skb() to detect
if called from netpoll.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Fri, 11 Mar 2016 08:44:08 +0000 (09:44 +0100)]
mlx4: use napi_consume_skb API to get bulk free operations
Bulk free of SKBs happen transparently by the API call napi_consume_skb().
The napi budget parameter is usually needed by napi_consume_skb()
to detect if called from netpoll. In this patch it has an extra meaning.
For mlx4 driver, the mlx4_en_stop_port() call is done outside
NAPI/softirq context, and cleanup the entire TX ring via
mlx4_en_free_tx_buf(). The code mlx4_en_free_tx_desc() for
freeing SKBs are shared with NAPI calls.
To handle this shared use the zero budget indication is reused,
and handled appropriately in napi_consume_skb(). To reflect this,
variable is called napi_mode for the function call that needed
this distinction.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Fri, 11 Mar 2016 08:43:58 +0000 (09:43 +0100)]
net: adjust napi_consume_skb to handle non-NAPI callers
Some drivers reuse/share code paths that free SKBs between NAPI
and non-NAPI calls. Adjust napi_consume_skb to handle this
use-case.
Before, calls from netpoll (w/ IRQs disabled) was handled and
indicated with a budget zero indication. Use the same zero
indication to handle calls not originating from NAPI/softirq.
Simply handled by using dev_consume_skb_any().
This adds an extra branch+call for the netpoll case (checking
in_irq() + irqs_disabled()), but that is okay as this is a slowpath.
Suggested-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chun-Hao Lin [Fri, 11 Mar 2016 06:21:14 +0000 (14:21 +0800)]
r8169:Remove unnecessary phy reset for pcie nic when setting link spped.
For pcie nic, after setting link speed and there is no link driver does not need
to do phy reset until link up.
For some pcie nics, to do this will also reset phy speed down counter and prevent
phy from auto speed down.
This patch fix the issue reported in following link.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
1547151
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 10 Mar 2016 22:10:21 +0000 (23:10 +0100)]
mlxsw: pci: Implement reset done check
Firmware now tells us that the reset is done by passing a magic value
via register. Use it to shorten the wait in case this is supported.
With old firmware, we still wait until the timeout is reached.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 10 Mar 2016 21:33:07 +0000 (18:33 -0300)]
sctp: allow sctp_transmit_packet and others to use gfp
Currently sctp_sendmsg() triggers some calls that will allocate memory
with GFP_ATOMIC even when not necessary. In the case of
sctp_packet_transmit it will allocate a linear skb that will be used to
construct the packet and this may cause sends to fail due to ENOMEM more
often than anticipated specially with big MTUs.
This patch thus allows it to inherit gfp flags from upper calls so that
it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
similar. All others, like retransmits or flushes started from BH, are
still allocated using GFP_ATOMIC.
In netperf tests this didn't result in any performance drawbacks when
memory is not too fragmented and made it trigger ENOMEM way less often.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Gauthier [Thu, 10 Mar 2016 16:14:59 +0000 (17:14 +0100)]
ovs: allow nl 'flow set' to use ufid without flow key
When we want to change a flow using netlink, we have to identify it to
be able to perform a lookup. Both the flow key and unique flow ID
(ufid) are valid identifiers, but we always have to specify the flow
key in the netlink message. When both attributes are there, the ufid
is used. The flow key is used to validate the actions provided by
the userland.
This commit allows to use the ufid without having to provide the flow
key, as it is already done in the netlink 'flow get' and 'flow del'
path. The flow key remains mandatory when an action is provided.
Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Ferre [Thu, 10 Mar 2016 15:44:32 +0000 (16:44 +0100)]
net: macb: fix default configuration for GMAC on AT91
On AT91 SoCs, the User Register (USRIO) exposes a switch to configure the
"Reduced" or "Traditional" version of the Media Independent Interface
(RMII vs. MII or RGMII vs. GMII).
As on the older EMAC version, on GMAC, this switch is set by default to the
non-reduced type of interface, so use the existing capability and extend it to
GMII as well. We then keep the current logic in the macb_init() function.
The capabilities of sama5d2, sama5d4 and sama5d3 GEM interface are updated in
the macb_config structure to be able to properly enable them with a traditional
interface (GMII or MII).
Reported-by: Romain HENRIET <romain.henriet@l-acoustics.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Thu, 10 Mar 2016 12:58:58 +0000 (13:58 +0100)]
phy: remove documentation of removed members of phy_device structure
Commit
e5a03bfd873c ("phy: Add an mdio_device structure") removed addr,
bus and dev member of the phy_device structure.
This patch remove the documentation about those members.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Mar 2016 02:08:01 +0000 (22:08 -0400)]
Merge branch 'xen-netback-fix-multiple-extra-info-handling'
Paul Durrant says:
====================
xen-netback: fix multiple extra info handling
If a frontend passes multiple extra info fragments to netback on the guest
transmit side, because xen-netback does not account for this properly, only
a single ack response will be sent. This will eventually cause processing
of the shared ring to wedge.
This series re-imports the canonical netif.h from Xen, where the ring
protocol documentation has been updated, fixes this issue in xen-netback
and also adds a patch to reduce log spam.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Thu, 10 Mar 2016 12:30:28 +0000 (12:30 +0000)]
xen-netback: reduce log spam
Remove the "prepare for reconnect" pr_info in xenbus.c. It's largely
uninteresting and the states of the frontend and backend can easily be
observed by watching the (o)xenstored log.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Thu, 10 Mar 2016 12:30:27 +0000 (12:30 +0000)]
xen-netback: support multiple extra info fragments passed from frontend
The code does not currently support a frontend passing multiple extra info
fragments to the backend in a tx request. The xenvif_get_extras() function
handles multiple extra_info fragments but make_tx_response() assumes there
is only ever a single extra info fragment.
This patch modifies xenvif_get_extras() to pass back a count of extra
info fragments, which is then passed to make_tx_response() (after
possibly being stashed in pending_tx_info for deferred responses).
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Thu, 10 Mar 2016 12:30:26 +0000 (12:30 +0000)]
xen-netback: re-import canonical netif header
The canonical netif header (in the Xen source repo) and the Linux variant
have diverged significantly. Recently much documentation has been added to
the canonical header which is highly useful for developers making
modifications to either xen-netfront or xen-netback. This patch therefore
re-imports the canonical header in its entirity.
To maintain compatibility and some style consistency with the old Linux
variant, the header was stripped of its emacs boilerplate, and
post-processed and copied into place with the following commands:
ed -s netif.h << EOF
H
,s/NETTXF_/XEN_NETTXF_/g
,s/NETRXF_/XEN_NETRXF_/g
,s/NETIF_/XEN_NETIF_/g
,s/XEN_XEN_/XEN_/g
,s/netif/xen_netif/g
,s/xen_xen_/xen_/g
,s/^typedef.*$//g
,s/^ /${TAB}/g
w
$
w
EOF
indent --line-length 80 --linux-style netif.h \
-o include/xen/interface/io/netif.h
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Shengju [Thu, 10 Mar 2016 08:55:50 +0000 (08:55 +0000)]
netconf: add macro to represent all attributes
This patch adds macro NETCONFA_ALL to represent all type of netconf
attributes for IPv4 and IPv6.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 10 Mar 2016 07:31:57 +0000 (15:31 +0800)]
sctp: fix the transports round robin issue when init is retransmitted
prior to this patch, at the beginning if we have two paths in one assoc,
they may have the same params other than the last_time_heard, it will try
the paths like this:
1st cycle
try trans1 fail.
then trans2 is selected.(cause it's last_time_heard is after trans1).
2nd cycle:
try trans2 fail
then trans2 is selected.(cause it's last_time_heard is after trans1).
3rd cycle:
try trans2 fail
then trans2 is selected.(cause it's last_time_heard is after trans1).
....
trans1 will never have change to be selected, which is not what we expect.
we should keeping round robin all the paths if they are just added at the
beginning.
So at first every tranport's last_time_heard should be initialized 0, so
that we ensure they have the same value at the beginning, only by this,
all the transports could get equal chance to be selected.
Then for sctp_trans_elect_best, it should return the trans_next one when
*trans == *trans_next, so that we can try next if it fails, but now it
always return trans. so we can fix it by exchanging these two params when
we calls sctp_trans_elect_tie().
Fixes: 4c47af4d5eb2 ('net: sctp: rework multihoming retransmission path selection to rfc4960')
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Wed, 9 Mar 2016 23:22:56 +0000 (23:22 +0000)]
rxrpc: Replace all unsigned with unsigned int
Replace all "unsigned" types with "unsigned int" types.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Mar 2016 19:03:34 +0000 (15:03 -0400)]
Merge tag 'wireless-drivers-next-for-davem-2016-03-09' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers patches for 4.6
Major changes:
ath10k
* dt: add bindings for ipq4019 wifi block
* start adding support for qca4019 chip
ath9k
* add device ID for Toshiba WLM-20U2/GN-1080
* allow more than one interface on DFS channels
bcma
* move flash detection code to ChipCommon core driver
brcmfmac
* IPv6 Neighbor discovery offload
* driver settings that can be populated from different sources
* country code setting in firmware
* length checks to validate firmware events
* new way to determine device memory size needed for BCM4366
* various offloads during Wake on Wireless LAN (WoWLAN)
* full Management Frame Protection (MFP) support
iwlwifi
* add support for thermal device / cooling device
* improvements in scheduled scan without profiles
* new firmware support (-21.ucode)
* add MSIX support for 9000 devices
* enable MU-MIMO and take care of firmware restart
* add support for large SKBs in mvm to reach A-MSDU
* add support for filtering frames from a BA session
* start implementing the new Rx path for 9000 devices
* enable the new Radio Resource Management (RRM) nl80211 feature flag
* add a new module paramater to disable VHT
* build infrastructure for Dynamic Queue Allocation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Mar 2016 19:01:00 +0000 (15:01 -0400)]
Merge branch 'net-minor-cleanups-and-optimizations'
Alexander Duyck says:
====================
A couple of minor clean-ups and optimizations
This patch series is basically just a v2 of a couple patches I recently
submitted.
The two patches aren't technically related but there are just items I found
while cleaning up and prepping some further work to enable Tx checksums for
tunnels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 9 Mar 2016 17:25:26 +0000 (09:25 -0800)]
csum: Update csum_block_add to use rotate instead of byteswap
The code for csum_block_add was doing a funky byteswap to swap the even and
odd bytes of the checksum if the offset was odd. Instead of doing this we
can save ourselves some trouble and just shift by 8 as this should have the
same effect in terms of the final checksum value and only requires one
instruction.
In addition we can update csum_block_sub to just use csum_block_add with a
inverse value for csum2. This way we follow the same code path as
csum_block_add without having to duplicate it.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 9 Mar 2016 17:24:23 +0000 (09:24 -0800)]
gro: Defer clearing of flush bit in tunnel paths
This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that
we do not clear the flush bit until after we have called the next level GRO
handler. Previously this was being cleared before parsing through the list
of frames, however this resulted in several paths where either the bit
needed to be reset but wasn't as in the case of FOU, or cases where it was
being set as in GENEVE. By just deferring the clearing of the bit until
after the next level protocol has been parsed we can avoid any unnecessary
bit twiddling and avoid bugs.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sat, 12 Mar 2016 11:03:27 +0000 (12:03 +0100)]
rocker: move ageing_time from struct rocker to struct ofdpa
This is OF-DPA specific, used only there, similar to
ofdpa_port->ageing_time. So move it to OF-DPA code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 Mar 2016 20:20:21 +0000 (15:20 -0500)]
Merge branch 'qed-mf-updates'
Yuval Mintz says:
====================
qed: Management firmware updates
This series contains several changes to driver interaction with the
management fw.
The biggest [& most significant] change here is a change in the locking
scheme and re-definition of the 'critical section' when accessing shared
resources toward the goal of interacting with the management firmware.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 9 Mar 2016 07:16:26 +0000 (09:16 +0200)]
qed: Enlrage the drain timeout
In the scenario where slowpath configuration isn't passing due to
various pause configurations affecting the chip, the theoretical time
required in worst-case-scenario to empty hw fifos sufficiently to
guarantee that slowpath configuration would flow is currently
insufficient.
This increases such a drain request to the theoretical maximum.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zvi Nachmani [Wed, 9 Mar 2016 07:16:25 +0000 (09:16 +0200)]
qed: Notify of transciever changes
Handle a new message from the MFW, one that indicate that the transciever
state has changed, and log that into the system logs.
Signed-off-by: Zvi Nachmani <Zvi.Nachmani@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Wed, 9 Mar 2016 07:16:24 +0000 (09:16 +0200)]
qed: Major changes to MB locking
Driver interaction with the managemnt firmware is done via mailbox
commands which the management firmware periodically sample, as well
as placing of additional data in set places in the shared memory.
Each PF has a single designated mailbox address, and all flows that
require messaging to the management should use it.
This patch does 2 things:
1. It re-defines the critical section surrounding the mailbox sending -
that section should include the setting of the shared memory as well as
the sending of the command [otherwise a race might send a command with
the data of a different command].
2. It moves the locking scheme from using mutices into using spinlocks.
This lays the groundwork for sending MFW commands from non-sleepable
contexts.
Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Wed, 9 Mar 2016 07:16:23 +0000 (09:16 +0200)]
qed: Prevent MF link notifications
When device is configured for Multi-function mode, some older management
firmware might incorrectly notify interfaces of link changes while they
haven't requested the physical link configuration to be set.
This can create bizzare race conditions where unloading interfaces are
getting notified that the link is up.
Let the driver compensate - store the logical requested state of the link
and don't propagate notifications after protocol driver explicitly
requires the link to be unset.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 Mar 2016 20:14:27 +0000 (15:14 -0500)]
Merge branch 'bpf-flow-labels'
Daniel Borkmann says:
====================
BPF support for flow labels
This set adds support for tunnel key flow labels for vxlan
and geneve devices in collect meta data mode and eBPF support
for managing these. For details please see individual patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 9 Mar 2016 02:00:05 +0000 (03:00 +0100)]
bpf: support flow label for bpf_skb_{set, get}_tunnel_key
This patch extends bpf_tunnel_key with a tunnel_label member, that maps
to ip_tunnel_key's label so underlying backends like vxlan and geneve
can propagate the label to udp_tunnel6_xmit_skb(), where it's being set
in the IPv6 header. It allows for having 20 more bits to encode/decode
flow related meta information programmatically. Tested with vxlan and
geneve.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 9 Mar 2016 02:00:04 +0000 (03:00 +0100)]
geneve: support setting IPv6 flow label
This work adds support for setting the IPv6 flow label for geneve per
device and through collect metadata (ip_tunnel_key) frontends. Also here,
the geneve dst cache does not need any special considerations, for the
cases where caches can be used, the label is static per cache.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 9 Mar 2016 02:00:03 +0000 (03:00 +0100)]
vxlan: support setting IPv6 flow label
This work adds support for setting the IPv6 flow label for vxlan per
device and through collect metadata (ip_tunnel_key) frontends. The
vxlan dst cache does not need any special considerations here, for
the cases where caches can be used, the label is static per cache.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 9 Mar 2016 02:00:02 +0000 (03:00 +0100)]
ip_tunnel: add support for setting flow label via collect metadata
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label
from call sites. Currently, there's no such option and it's always set to
zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so
that flow-based tunnels via collect metadata frontends can make use of it.
vxlan and geneve will be converted to add flow label support separately.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 Mar 2016 21:54:56 +0000 (13:54 -0800)]
cisco: enic: Update logging macros and uses
Don't hide varibles used by the logging macros.
Miscellanea:
o Use the more common ##__VA_ARGS__ extension
o Add missing newlines to formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 Mar 2016 19:59:55 +0000 (14:59 -0500)]
Merge branch 'bridge_ageing_time'
Stephen Hemminger says:
====================
bridge: ageing timer regression fix
This fixes regression in how ageing timer is managed.
Backing out the change required fixing switch drivers as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Tue, 8 Mar 2016 20:59:35 +0000 (12:59 -0800)]
bridge: allow zero ageing time
This fixes a regression in the bridge ageing time caused by:
commit
c62987bbd8a1 ("bridge: push bridge setting ageing_time down to switchdev")
There are users of Linux bridge which use the feature that if ageing time
is set to 0 it causes entries to never expire. See:
https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
For a pure software bridge, it is unnecessary for the code to have
arbitrary restrictions on what values are allowable.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 8 Mar 2016 20:59:34 +0000 (12:59 -0800)]
rocker: set FDB cleanup timer according to lowest ageing time
In rocker, ageing time is a per-port attribute, so the next time the FDB
cleanup timer fires should be set according to the lowest ageing time.
This will later allow us to delete the BR_MIN_AGEING_TIME macro, which was
added to guarantee minimum ageing time in the bridge layer, thereby breaking
existing behavior.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 8 Mar 2016 20:59:33 +0000 (12:59 -0800)]
mlxsw: spectrum: Check requested ageing time is valid
Commit
c62987bbd8a1 ("bridge: push bridge setting ageing_time down to
switchdev") added a check for minimum and maximum ageing time, but this
breaks existing behaviour where one can set ageing time to 0 for a
non-learning bridge.
Push this check down to the driver and allow the check in the bridge
layer to be removed. Currently ageing time 0 is refused by the driver,
but we can later add support for this functionality.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Tue, 8 Mar 2016 20:18:54 +0000 (15:18 -0500)]
macvtap: always pass ethernet header in linear
The stack expects link layer headers in the skb linear section.
Macvtap can create skbs with llheader in frags in edge cases:
when (IFF_VNET_HDR is off or vnet_hdr.hdr_len < ETH_HLEN) and
prepad + len > PAGE_SIZE and vnet_hdr.flags has no or bad csum.
Add checks to ensure linear is always at least ETH_HLEN.
At this point, len is already ensured to be >= ETH_HLEN.
For backwards compatiblity, rounds up short vnet_hdr.hdr_len.
This differs from tap and packet, which return an error.
Fixes
b9fb9ee07e67 ("macvtap: add GSO/csum offload support")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Fri, 11 Mar 2016 09:08:45 +0000 (11:08 +0200)]
net/flower: Fix pointer cast
Cast pointer to unsigned long instead of u64, to fix compilation warning
on 32 bit arch, spotted by 0day build.
Fixes: 5b33f48 ("net/flower: Introduce hardware offload support")
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Mar 2016 21:24:03 +0000 (16:24 -0500)]
Merge branch 'flower-offload'
Amir Vadai says:
====================
cls_flower hardware offload support
Please see changes from V2 at the bottom.
This patchset introduces cls_flower hardware offload support over ConnectX-4
driver, more hardware vendors are welcome to use it too.
This patchset is based on John's infrastructure for tc offloading [2] to add
hardware offload support to the flower filter. It also extends the support to
an additional tc action - skbedit mark operation.
NIC driver that was used is ConnectX-4. Feature is off by default and could be
turned on using ethtool.
Some commands to use this code:
export TC=../iproute2/tc/tc
export ETH=ens9
ethtool -K ens9 hw-tc-offload on
$TC qdisc add dev $ETH ingress
$TC filter add dev $ETH protocol ip prio 20 parent ffff: \
flower ip_proto 1 \
dst_mac 7c:fe:90:69:81:62 \
src_mac 7c:fe:90:69:81:56 \
dst_ip 11.11.11.11 \
src_ip 11.11.11.12 \
indev $ETH \
action drop
$TC filter add dev $ETH protocol ip prio 30 parent ffff: \
flower ip_proto 6 \
indev $ETH \
action skbedit mark 0x1234
$TC filter add dev $ETH protocol ip prio 10 parent ffff: \
handle 0x1234 fw action pass
The code was tested and applied on top of commit
3ebeac1 ("Merge branch
'cxgb4-next'")
Changes from V2:
- patch 1/10 ("net/flower: Introduce hardware offload support")
- Remove unused variable [Dave]
- Don't fail command when HW can't offload filter [John]
- patch 3/10 ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef")
- Mention in changelog that struct tc_action is now exposed out of the ifdef.
- patch 4/10 ("net/act_skbedit: Utility functions for mark action")
- Document clearly that is_tcf_skbedit_mark() is returning true if and only
if the only action is mark [Dave]
- patch 8/10 ("net/mlx5e: Introduce tc offload support")
- make mlx5e_tc_add_flow() static
Changes from V1:
- patch 3/10 ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef")
- fixed return value of tc_no_actions
Changes from V0:
- Use tc_no_actions and tc_for_each_action instead of ifdef CONFIG_NET_CLS_ACT
- Replace ENOTSUPP (and some EINVAL) with EOPNOTSUPP
- Name the flower command enum
- fl_hw_destroy_filter() to return void - nobody uses the return value
- mlx5e_tc_init() and mlx5e_tc_cleanup() to be called from the right places.
- When adding HW rule fails - fail the command
- Rules are added to be processed both by HW and SW unless SKIP_HW is given
- Adding patch 6/10 ("net/mlx5e: Relax ndo_setup_tc handle restriction")
Main changes from the RFC [1]:
- API
- Using ndo_setup_tc() instead of switchdev
- act_skbedit, act_gact
- Actions are not serialized to NIC driver, instead using access functions.
- cls_flower
- prevent double classification by software by not adding
successfuly offloaded filters to the hashtable
- Fixed some bugs in original RFC with rule delete
- mlx5
- Adding flow table to kernel namespace instead of a new namespace
- s/offload/tc/ in many places
- no need for a special kconfig since switchdev is not used
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 8 Mar 2016 10:42:38 +0000 (12:42 +0200)]
net/mlx5e: Support offload cls_flower with skbedit mark action
Introduce offloading of skbedit mark action.
For example, to mark with 0x1234, all TCP (ip_proto 6) packets arriving
to interface ens9:
# tc qdisc add dev ens9 ingress
# tc filter add dev ens9 protocol ip parent ffff: \
flower ip_proto 6 \
indev ens9 \
action skbedit mark 0x1234
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 8 Mar 2016 10:42:37 +0000 (12:42 +0200)]
net/mlx5e: Support offload cls_flower with drop action
Parse tc_cls_flower_offload into device specific commands and program
the hardware to classify and act accordingly.
For example, to drop ICMP (ip_proto 1) packets from specific smac, dmac,
src_ip, src_ip, arriving to interface ens9:
# tc qdisc add dev ens9 ingress
# tc filter add dev ens9 protocol ip parent ffff: \
flower ip_proto 1 \
dst_mac 7c:fe:90:69:81:62 src_mac 7c:fe:90:69:81:56 \
dst_ip 11.11.11.11 src_ip 11.11.11.12 indev ens9 \
action drop
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 8 Mar 2016 10:42:36 +0000 (12:42 +0200)]
net/mlx5e: Introduce tc offload support
Extend ndo_setup_tc() to support ingress tc offloading. Will be used by
later patches to offload tc flower filter.
Feature is off by default and could be enabled by issuing:
# ethtool -K eth0 hw-tc-offload on
Offloads flow table is dynamically created when first filter is
added.
Rules are saved in a hash table that is maintained by the consumer (for
example - the flower offload in the next patch).
When last filter is removed and no filters exist in the hash table, the
offload flow table is destroyed.
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 8 Mar 2016 10:42:35 +0000 (12:42 +0200)]
net/mlx5e: Add a new priority for kernel flow tables
Move the vlan and main flow tables to use priority 1. This will allow
the upcoming TC offload logic to use a higher priority (0) for the
offload steering table.
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>