Tushar Dave [Fri, 27 Oct 2017 23:12:30 +0000 (16:12 -0700)]
samples/bpf: adjust rlimit RLIMIT_MEMLOCK for xdp1
Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure.
e.g.
[root@lab bpf]#./xdp1 -N $(</sys/class/net/eth2/ifindex)
failed to create a map: 1 Operation not permitted
Fix it.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 27 Oct 2017 22:56:01 +0000 (15:56 -0700)]
net: dsa: b53: Export b53_configure_vlan()
bcm_sf2 and b53 replicate the same operations: clear all VLANs and set
their ports to the default VLAN tag (1 for these devices) so export the
b53 function doing just that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Fri, 27 Oct 2017 21:37:03 +0000 (14:37 -0700)]
liquidio: get rid of false alarm "Unknown cmd 27" in dmesg
Creating a macvtap interface with the liquidio VF driver as lower device
causes this alarming message to show up in dmesg:
liquidio_link_ctrl_cmd_completion Unknown cmd 27
That's actually a false alarm because cmd 27 is the value of the macro
OCTNET_CMD_SET_UC_LIST which is known. It's a control command sent from
host to NIC firmware to set the unicast MAC address list of the macvtap
lower device.
Make the false alarm go away by adding a case for OCTNET_CMD_SET_UC_LIST
in liquidio_link_ctrl_cmd_completion().
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Fri, 27 Oct 2017 19:36:38 +0000 (12:36 -0700)]
hv_netvsc: Set tx_table to equal weight after subchannels open
In some cases, like internal vSwitch, the host doesn't provide
send indirection table updates. This patch sets the table to be
equal weight after subchannels are all open. Otherwise, all workload
will be on one TX channel.
As tested, this patch has largely increased the throughput over
internal vSwitch.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matteo Croce [Fri, 27 Oct 2017 18:08:23 +0000 (20:08 +0200)]
ppp: allow usage in namespaces
Check for CAP_NET_ADMIN with ns_capable() instead of capable()
to allow usage of ppp in user namespace other than the init one.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 29 Oct 2017 02:20:28 +0000 (11:20 +0900)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2017-10-27
This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.
As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).
Overview
========
Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.
The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.
Proposal
========
Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.
For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.
As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:
$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
idleslope I
Note that the parameters for this qdisc are the ones defined by the
802.1Q-2014 spec, so no hardware specific functionality is exposed here.
Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is
not yet covered by this proposal.
v2: Merged patch 6 of the original series into patch 4 based on feedback
from David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 29 Oct 2017 02:16:22 +0000 (11:16 +0900)]
Merge branch 'l2tp-register-sessions-atomically'
Guillaume Nault says:
====================
l2tp: register sessions atomically
Currently l2tp_session_create() allocates a session, partially
initialises it and finally registers it. It therefore exposes sessions
that aren't fully initialised to the rest of the system, because
pseudo-wire specific initialisation can only happen after
l2tp_session_create() returns.
This leads to several crashes when these sessions are used or deleted.
This series starts by splitting session registration out of
l2tp_session_create() (patch #1). Thus allowing pseudo-wires code to
terminate the initialisation phase before registration.
Then patch #2 fixes the eth pseudo-wire code. This requires protecting
the session's netdevice pointer with RCU, because it still needs to be
updated concurrently after the session got registered.
Remaining patches take care of ppp pseudo-wires. RCU protection is
needed there too, for the same reasons. This time it's the pppol2tp
socket pointer that gets protected. For clarity, and since the
conversion requires more modifications, introducing RCU is done in
its own patch (#3). Then patch #4 only has to take care of fixing
sessions initialisation and registration (and adapting part of the
deletion process).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 27 Oct 2017 14:51:52 +0000 (16:51 +0200)]
l2tp: initialise PPP sessions before registering them
pppol2tp_connect() initialises L2TP sessions after they've been exposed
to the rest of the system by l2tp_session_register(). This puts
sessions into transient states that are the source of several races, in
particular with session's deletion path.
This patch centralises the initialisation code into
pppol2tp_session_init(), which is called before the registration phase.
The only field that can't be set before session registration is the
pppol2tp socket pointer, which has already been converted to RCU. So
pppol2tp_connect() should now be race-free.
The session's .session_close() callback is now set before registration.
Therefore, it's always called when l2tp_core deletes the session, even
if it was created by pppol2tp_session_create() and hasn't been plugged
to a pppol2tp socket yet. That'd prevent session free because the extra
reference taken by pppol2tp_session_close() wouldn't be dropped by the
socket's ->sk_destruct() callback (pppol2tp_session_destruct()).
We could set .session_close() only while connecting a session to its
pppol2tp socket, or teach pppol2tp_session_close() to avoid grabbing a
reference when the session isn't connected, but that'd require adding
some form of synchronisation to be race free.
Instead of that, we can just let the pppol2tp socket hold a reference
on the session as soon as it starts depending on it (that is, in
pppol2tp_connect()). Then we don't need to utilise
pppol2tp_session_close() to hold a reference at the last moment to
prevent l2tp_core from dropping it.
When releasing the socket, pppol2tp_release() now deletes the session
using the standard l2tp_session_delete() function, instead of merely
removing it from hash tables. l2tp_session_delete() drops the reference
the sessions holds on itself, but also makes sure it doesn't remove a
session twice. So it can safely be called, even if l2tp_core already
tried, or is concurrently trying, to remove the session.
Finally, pppol2tp_session_destruct() drops the reference held by the
socket.
Fixes:
fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 27 Oct 2017 14:51:52 +0000 (16:51 +0200)]
l2tp: protect sock pointer of struct pppol2tp_session with RCU
pppol2tp_session_create() registers sessions that can't have their
corresponding socket initialised. This socket has to be created by
userspace, then connected to the session by pppol2tp_connect().
Therefore, we need to protect the pppol2tp socket pointer of L2TP
sessions, so that it can safely be updated when userspace is connecting
or closing the socket. This will eventually allow pppol2tp_connect()
to avoid generating transient states while initialising its parts of the
session.
To this end, this patch protects the pppol2tp socket pointer using RCU.
The pppol2tp socket pointer is still set in pppol2tp_connect(), but
only once we know the function isn't going to fail. It's eventually
reset by pppol2tp_release(), which now has to wait for a grace period
to elapse before it can drop the last reference on the socket. This
ensures that pppol2tp_session_get_sock() can safely grab a reference
on the socket, even after ps->sk is reset to NULL but before this
operation actually gets visible from pppol2tp_session_get_sock().
The rest is standard RCU conversion: pppol2tp_recv(), which already
runs in atomic context, is simply enclosed by rcu_read_lock() and
rcu_read_unlock(), while other functions are converted to use
pppol2tp_session_get_sock() followed by sock_put().
pppol2tp_session_setsockopt() is a special case. It used to retrieve
the pppol2tp socket from the L2TP session, which itself was retrieved
from the pppol2tp socket. Therefore we can just avoid dereferencing
ps->sk and directly use the original socket pointer instead.
With all users of ps->sk now handling NULL and concurrent updates, the
L2TP ->ref() and ->deref() callbacks aren't needed anymore. Therefore,
rather than converting pppol2tp_session_sock_hold() and
pppol2tp_session_sock_put(), we can just drop them.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 27 Oct 2017 14:51:51 +0000 (16:51 +0200)]
l2tp: initialise l2tp_eth sessions before registering them
Sessions must be initialised before being made externally visible by
l2tp_session_register(). Otherwise the session may be concurrently
deleted before being initialised, which can confuse the deletion path
and eventually lead to kernel oops.
Therefore, we need to move l2tp_session_register() down in
l2tp_eth_create(), but also handle the intermediate step where only the
session or the netdevice has been registered.
We can't just call l2tp_session_register() in ->ndo_init() because
we'd have no way to properly undo this operation in ->ndo_uninit().
Instead, let's register the session and the netdevice in two different
steps and protect the session's device pointer with RCU.
And now that we allow the session's .dev field to be NULL, we don't
need to prevent the netdevice from being removed anymore. So we can
drop the dev_hold() and dev_put() calls in l2tp_eth_create() and
l2tp_eth_dev_uninit().
Fixes:
d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 27 Oct 2017 14:51:50 +0000 (16:51 +0200)]
l2tp: don't register sessions in l2tp_session_create()
Sessions created by l2tp_session_create() aren't fully initialised:
some pseudo-wire specific operations need to be done before making the
session usable. Therefore the PPP and Ethernet pseudo-wires continue
working on the returned l2tp session while it's already been exposed to
the rest of the system.
This can lead to various issues. In particular, the session may enter
the deletion process before having been fully initialised, which will
confuse the session removal code.
This patch moves session registration out of l2tp_session_create(), so
that callers can control when the session is exposed to the rest of the
system. This is done by the new l2tp_session_register() function.
Only pppol2tp_session_create() can be easily converted to avoid
modifying its session after registration (the debug message is dropped
in order to avoid the need for holding a reference on the session).
For pppol2tp_connect() and l2tp_eth_create()), more work is needed.
That'll be done in followup patches. For now, let's just register the
session right after its creation, like it was done before. The only
difference is that we can easily take a reference on the session before
registering it, so, at least, we're sure it's not going to be freed
while we're working on it.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 29 Oct 2017 02:14:08 +0000 (11:14 +0900)]
tcp: Remove "linux/unaligned/access_ok.h" include.
This causes build failures:
In file included from net/ipv4/tcp_input.c:79:0:
./include/linux/unaligned/access_ok.h:7:28: error: redefinition of
'get_unaligned_le16'
In file included from ./include/asm-generic/unaligned.h:17:0,
from ./arch/arm/include/generated/asm/unaligned.h:1,
from net/ipv4/tcp_input.c:76:
./include/linux/unaligned/le_struct.h:6:19: note: previous definition
of 'get_unaligned_le16' was here
In file included from net/ipv4/tcp_input.c:79:0:
./include/linux/unaligned/access_ok.h:12:28: error: redefinition of
'get_unaligned_le32'
Plain "asm/access_ok.h", which is already included, is
sufficient.
Fixes:
60e2a7780793 ("tcp: TCP experimental option for SMC")
Reported-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Fri, 27 Oct 2017 12:38:21 +0000 (18:08 +0530)]
cxgb3: Check and handle the dma mapping errors
This patch adds checks at approprate places whether *dma_map*() call has
succeeded or not.
Original Work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francois Romieu [Fri, 27 Oct 2017 10:24:49 +0000 (13:24 +0300)]
r8169: Add support for interrupt coalesce tuning (ethtool -C)
Kirr: In particular with
ethtool -C <ifname> rx-usecs 0 rx-frames 0
now it is possible to disable RX delays when NIC usage requires low-latency.
See this thread for context:
https://www.spinics.net/lists/netdev/msg217665.html
My specific case is that:
We have many computers with gigabit Realtek NICs. For 2 such computers
connected to a gigabit store-and-forward switch the minimum round-trip
time for small pings (`ping -i 0 -w 3 -s 56 -q peer`) is ~ 30μs.
However it turned out that when Ethernet frame length transitions 127 ->
128 bytes (`ping -i 0 -w 3 -s {81 -> 82} -q peer`) the lowest RTT
transitions step-wise to ~ 270μs.
As David Light said this is RX interrupt mitigation done by NIC which creates
the latency. For workloads when low-latency is required with e.g. Intel,
BCM etc NIC drivers one just uses `ethtool -C rx-usecs ...` to reduce
the time NIC delays before interrupting CPU, but it turned out
`ethtool -C` is not supported by r8169 driver.
Like Stéphane ANCELOT I've traced the problem down to IntrMitigate being
hardcoded to != 0 for our chips (we have 8168 based NICs):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n5460
static void rtl_hw_start_8169(struct net_device *dev) {
...
/*
* Undocumented corner. Supposedly:
* (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets
*/
RTL_W16(IntrMitigate, 0x0000);
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n6346
static void rtl_hw_start_8168(struct net_device *dev) {
...
RTL_W16(IntrMitigate, 0x5151);
and then I've also found
https://www.spinics.net/lists/netdev/msg217665.html
and original Francois' patch:
https://www.spinics.net/lists/netdev/msg217984.html
https://www.spinics.net/lists/netdev/msg218207.html
So could we please finally get support for tuning r8169 interrupt
coalescing in tree? (so that next poor soul who hits the problem does
not need to go all the way to dig into driver sources and internet
wildly and finally patch locally
-RTL_W16(IntrMitigate, 0x5151);
+RTL_W16(IntrMitigate, 0x5100);
guessing whether it is right or not and also having to care to deploy
the patch everywhere it needs to be used, etc...).
To do so I've took original Francois's patch from 2012 and reworked it a bit:
- updated to latest net-next.git;
- adjusted scaling setup based on feedback from Hayes to pick up scaling
vector depending not only on link speed but also on CPlusCmd[0:1] and to
adjust CPlusCmd[0:1] correspondingly when setting timings;
- improved a bit (I think so) error handling.
I've tested the patch on "RTL8168d/8111d" (XID
083000c0) and with it and
`ethtool -C rx-usecs 0 rx-frames 0` on both ends it improves:
- minimum RTT latency:
~270μs -> ~30μs (small packet),
~330μs -> ~110μs (full 1.5K ethernet frame)
- average RTT latency:
~480μs -> ~50μs (small packet),
~560μs -> ~125μs (full 1.5K ethernet frame)
( before:
root@neo1:# ping -i 0 -w 3 -s 82 -q neo2
PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data.
--- neo2.kirr.nexedi.com ping statistics ---
5906 packets transmitted, 5905 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.274/0.485/0.607/0.026 ms, ipg/ewma 0.508/0.489 ms
root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2
PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data.
--- neo2.kirr.nexedi.com ping statistics ---
5073 packets transmitted, 5073 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.330/0.566/0.710/0.028 ms, ipg/ewma 0.591/0.544 ms
after:
root@neo1# ping -i 0 -w 3 -s 82 -q neo2
PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data.
--- neo2.kirr.nexedi.com ping statistics ---
45815 packets transmitted, 45815 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.036/0.051/0.368/0.010 ms, ipg/ewma 0.065/0.053 ms
root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2
PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data.
--- neo2.kirr.nexedi.com ping statistics ---
21250 packets transmitted, 21250 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.112/0.125/0.390/0.007 ms, ipg/ewma 0.141/0.125 ms
the small -> 1.5K latency growth is understandable as it takes ~15μs
to transmit 1.5K on 1Gbps on the wire and with 2 hosts and 1 switch
and ICMP ECHO + ECHO reply the packet has to travel 4 ethernet
segments which is already 60μs;
probably something a bit else is also there as e.g. on Linux, even
with `cpupower frequency-set -g performance`, on some computers I've
noticed the kernel can be spending more time in software-only mode
when incoming packets go in less frequently. E.g. this program can
demonstrate the effect for ICMP ECHO processing:
https://lab.nexedi.com/kirr/bcc/blob/
43cfc13b/tools/pinglat.py
(later this was found to be partly due to C-states exit latencies) )
We have this patch running in our testing setup for 1 months already
without any issues observed.
It remains to be clarified whether RX and TX timers use the same base.
For now I've set them equally, but Francois's original patch version
suggests it could be not the same.
I've got no feedback at all to my original posting of this patch and questions
https://www.spinics.net/lists/netdev/msg457173.html
neither from Francois, nor from any people from Realtek during one month.
So I suggest we simply apply it to net-next.git now.
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes Wang <hayeswang@realtek.com>
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Stéphane ANCELOT <sancelot@free.fr>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 29 Oct 2017 02:03:44 +0000 (11:03 +0900)]
Merge branch 'bridge-make-setlink-dellink-notifications-more-accurate'
Nikolay Aleksandrov says:
====================
bridge: make setlink/dellink notifications more accurate
Before this set the bridge would generate a notification on vlan add or del
even if they didn't actually do any changes, which confuses listeners and
is generally not preferred. We could also lose notifications on actual
changes if one adds a range of vlans and there's an error in the middle.
The problem with just breaking and returning an error is that we could
break existing user-space scripts which rely on the vlan delete to clear
all existing entries in the specified range and ignore the non-existing
errors (typically used to clear the current vlan config).
So in order to make the notifications more accurate while keeping backwards
compatibility we add a boolean that tracks if anything actually changed
during the config calls.
The vlan add is more difficult to fix because it always returns 0 even if
nothing changed, but we cannot use a specific error because the drivers
can return anything and we may mask it, also we'd need to update all places
that directly return the add result, thus to signal that a vlan was created
or updated and in order not to break overlapping vlan range add we pass
down the new boolean that tracks changes to the add functions to check
if anything was actually updated.
v6: moved "changed" in else branch in br|nbp_vlan_add, thanks to
Toshiaki Makita and retested everything again
v5: fix br_vlan_add return (v1 leftover) spotted by Toshiaki Makita
v4: set changed always to false in the non-vlan config case and retested
v3: rebased to latest net-next and fixed non-vlan config functions reported
by kbuild test bot
v2: pass changed down to vlan add instead of masking errors
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Fri, 27 Oct 2017 10:19:37 +0000 (13:19 +0300)]
bridge: vlan: signal if anything changed on vlan add
Before this patch there was no way to tell if the vlan add operation
actually changed anything, thus we would always generate a notification
on adds. Let's make the notifications more precise and generate them
only if anything changed, so use the new bool parameter to signal that the
vlan was updated. We cannot return an error because there are valid use
cases that will be broken (e.g. overlapping range add) and also we can't
risk masking errors due to calls into drivers for vlan add which can
potentially return anything.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Fri, 27 Oct 2017 10:19:36 +0000 (13:19 +0300)]
bridge: netlink: make setlink/dellink notifications more accurate
Before this patch we had cases that either sent notifications when there
were in fact no changes (e.g. non-existent vlan delete) or didn't send
notifications when there were changes (e.g. vlan add range with an error in
the middle, port flags change + vlan update error). This patch sends down
a boolean to the functions setlink/dellink use and if there is even a
single configuration change (port flag, vlan add/del, port state) then
we always send a notification. This is all done to keep backwards
compatibility with the opportunistic vlan delete, where one could
specify a vlan range that has missing vlans inside and still everything
in that range will be cleared, this is mostly used to clear the whole
vlan config with a single call, i.e. range 1-4094.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Fri, 27 Oct 2017 17:01:39 +0000 (10:01 -0700)]
tcp: remove unnecessary include
two extra #include are not necessary in tcp.h
Remove them.
Fixes:
40304b2a1567 ("bpf: BPF support for sock_ops")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 28 Oct 2017 10:24:39 +0000 (19:24 +0900)]
Merge branch 'tcp-more-perns-sysctls'
Eric Dumazet says:
====================
tcp: move 12 sysctls to namespaces
Ideally all TCP sysctls should be per netns.
This patch series takes care of 12 sysctls.
Remains the ones that need discussion :
sysctl_tcp_mem, sysctl_tcp_rmem, sysctl_tcp_wmem, and sysctl_tcp_max_orphans
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:32 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_pacing_ca_ratio
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:31 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_pacing_ss_ratio
Also remove an obsolete comment about TCP pacing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:30 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_invalid_ratelimit
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:29 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_autocorking
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:28 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_min_rtt_wlen
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:27 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_min_tso_segs
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:26 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_challenge_ack_limit
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:25 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_limit_output_bytes
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:24 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_workaround_signed_windows
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:23 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_tso_win_divisor
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:22 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_moderate_rcvbuf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Oct 2017 14:47:21 +0000 (07:47 -0700)]
tcp: Namespace-ify sysctl_tcp_nometrics_save
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:55:42 +0000 (22:55 -0700)]
drivers/net: smsc: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:55:34 +0000 (22:55 -0700)]
drivers/net: packetengines: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:55:27 +0000 (22:55 -0700)]
drivers/net: natsemi: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:55:20 +0000 (22:55 -0700)]
drivers/net: mellanox: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Matan Barak <matanb@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: netdev@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:55:13 +0000 (22:55 -0700)]
drivers/net: korina: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Roman Yeryomin <leroi.lists@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:55:07 +0000 (22:55 -0700)]
drivers/net: fealnx: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:54:59 +0000 (22:54 -0700)]
drivers/net: dlink: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Denis Kirjanov <kda@linux-powerpc.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:54:53 +0000 (22:54 -0700)]
drivers/net: chelsio/cxgb*: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Ganesh Goudar <ganeshgr@chelsio.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:54:45 +0000 (22:54 -0700)]
drivers/net: appletalk/cops: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:54:38 +0000 (22:54 -0700)]
drivers/net: amd: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 27 Oct 2017 05:54:25 +0000 (22:54 -0700)]
drivers/net: 8390: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Fri, 27 Oct 2017 04:10:35 +0000 (23:10 -0500)]
ipv6: exthdrs: use swap macro in ipv6_dest_hao
make use of the swap macro and remove unnecessary variable tmp_addr.
This makes the code easier to read and maintain.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhadram Varka [Fri, 27 Oct 2017 02:52:02 +0000 (08:22 +0530)]
stmmac: copy unicast mac address to MAC registers
Currently stmmac driver not copying the valid ethernet
MAC address to MAC registers. This patch takes care
of updating the MAC register with MAC address.
Signed-off-by: Bhadram Varka <vbhadram@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Cascón [Fri, 27 Oct 2017 00:35:38 +0000 (17:35 -0700)]
nfp: inform the VF driver needs to be restarted after changing the MAC
Add message to inform the VF MAC was changed and the need to restart
the VF driver for the changes to be effective.
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Thu, 26 Oct 2017 23:46:36 +0000 (16:46 -0700)]
liquidio: fix kernel panic in VF driver
Doing ifconfig down on VF driver in the middle of receiving line rate
traffic causes a kernel panic:
LiquidIO_VF 0000:02:00.3: should not come here should not get rx when poll mode = 0 for vf
BUG: unable to handle kernel NULL pointer dereference at (null)
.
.
.
Call Trace:
<IRQ>
? tasklet_action+0x102/0x120
__do_softirq+0x91/0x292
irq_exit+0xb6/0xc0
do_IRQ+0x4f/0xd0
common_interrupt+0x93/0x93
</IRQ>
RIP: 0010:cpuidle_enter_state+0x142/0x2f0
RSP: 0018:
ffffffffa6403e20 EFLAGS:
00000246 ORIG_RAX:
ffffffffffffff59
RAX:
0000000000000000 RBX:
0000000000000003 RCX:
000000000000001f
RDX:
0000000000000000 RSI:
000000002ab7519f RDI:
0000000000000000
RBP:
ffffffffa6403e58 R08:
0000000000000084 R09:
0000000000000018
R10:
ffffffffa6403df0 R11:
00000000000003c7 R12:
0000000000000003
R13:
ffffd27ebd806800 R14:
ffffffffa64d40d8 R15:
0000007be072823f
cpuidle_enter+0x17/0x20
call_cpuidle+0x23/0x40
do_idle+0x18c/0x1f0
cpu_startup_entry+0x64/0x70
rest_init+0xa5/0xb0
start_kernel+0x45e/0x46b
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x6f/0x72
secondary_startup_64+0xa5/0xa5
Code: Bad RIP value.
RIP: (null) RSP:
ffff9246ed003f28
CR2:
0000000000000000
---[ end trace
92731e80f31b7d7d ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x24000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Reason is: in the function assigned to net_device_ops->ndo_stop, the steps
for bringing down the interface are done in the wrong order. The step that
notifies the NIC firmware to stop forwarding packets to host is done too
late. Fix it by moving that step to the beginning.
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 14:50:07 +0000 (10:50 -0400)]
net: dsa: move fixed link registration helpers
The new bindings (dsa2.c) and the old bindings (legacy.c) share two
helpers dsa_cpu_dsa_setup and dsa_cpu_dsa_destroy, used to register or
deregister a fixed PHY if a given port has a corresponding device node.
Unclutter the code by moving them into two new port.c helpers,
dsa_port_fixed_link_register_of and dsa_port_fixed_link_(un)register_of.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sat, 28 Oct 2017 05:56:10 +0000 (01:56 -0400)]
bnxt_en: Fix randconfig build errors.
Fix undefined symbols when CONFIG_VLAN_8021Q or CONFIG_INET is not set.
Fixes:
8c95f773b4a3 ("bnxt_en: add support for Flower based vxlan encap/decap offload")
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andre Guedes [Tue, 17 Oct 2017 01:01:28 +0000 (18:01 -0700)]
igb: Add support for CBS offload
This patch adds support for Credit-Based Shaper (CBS) qdisc offload
from Traffic Control system. This support enable us to leverage the
Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
standard which was merged into 802.1Q in 2014. It enables traffic
prioritization and bandwidth reservation via the Credit-Based Shaper
which is implemented in hardware by i210 controller.
The patch introduces the igb_setup_tc() function which implements the
support for CBS qdisc hardware offload in the IGB driver. CBS offload
is the only traffic control offload supported by the driver at the
moment.
FQTSS transmission mode from i210 controller is automatically enabled
by the IGB driver when the CBS is enabled for the first hardware
queue. Likewise, FQTSS mode is automatically disabled when CBS is
disabled for the last hardware queue. Changing FQTSS mode requires NIC
reset.
FQTSS feature is supported by i210 controller only.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vinicius Costa Gomes [Tue, 17 Oct 2017 01:01:27 +0000 (18:01 -0700)]
net/sched: Add support for HW offloading for CBS
This adds support for offloading the CBS algorithm to the controller,
if supported. Drivers wanting to support CBS offload must implement
the .ndo_setup_tc callback and handle the TC_SETUP_CBS (introduced
here) type.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vinicius Costa Gomes [Tue, 17 Oct 2017 01:01:26 +0000 (18:01 -0700)]
net/sched: Introduce Credit Based Shaper (CBS) qdisc
This queueing discipline implements the shaper algorithm defined by
the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.
It's primary usage is to apply some bandwidth reservation to user
defined traffic classes, which are mapped to different queues via the
mqprio qdisc.
Only a simple software implementation is added for now.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesus Sanchez-Palencia [Tue, 17 Oct 2017 01:01:25 +0000 (18:01 -0700)]
net/sched: Add select_queue() class_ops for mqprio
When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch
the netdev_queue pointer that the current child qdisc is associated
with before creating the new qdisc.
Currently, when using mqprio as root qdisc, the kernel will end up
getting the queue #0 pointer from the mqprio (root qdisc), which leaves
any new child qdisc with a possibly wrong netdev_queue pointer.
Implementing the Qdisc_class_ops select_queue() on mqprio fixes this
issue and avoid an inconsistent state when child qdiscs are replaced.
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesus Sanchez-Palencia [Tue, 17 Oct 2017 01:01:24 +0000 (18:01 -0700)]
net/sched: Change behavior of mq select_queue()
Currently, the class_ops select_queue() implementation on sch_mq
returns a pointer to netdev_queue #0 when it receives and invalid
qdisc id. That can be misleading since all of mq's inner qdiscs are
attached to a valid netdev_queue.
Here we fix that by returning NULL when a qdisc id is invalid. This is
aligned with how select_queue() is implemented for sch_mqprio in the
next patch on this series, keeping a consistent behavior between these
two qdiscs.
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesus Sanchez-Palencia [Tue, 17 Oct 2017 01:01:23 +0000 (18:01 -0700)]
net/sched: Check for null dev_queue on create flow
In qdisc_alloc() the dev_queue pointer was used without any checks
being performed. If qdisc_create() gets a null dev_queue pointer, it
just passes it along to qdisc_alloc(), leading to a crash. That
happens if a root qdisc implements select_queue() and returns a null
dev_queue pointer for an "invalid handle", for example, or if the
dev_queue associated with the parent qdisc is null.
This patch is in preparation for the next in this series, where
select_queue() is being added to mqprio and as it may return a null
dev_queue.
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Intiyaz Basha [Thu, 26 Oct 2017 23:18:20 +0000 (16:18 -0700)]
liquidio: xmit_more support
Defer ringing the Tx doorbell if skb->xmit_more is set unless the Tx queue
is full or stopped. To keep latency low, use a deferral limit of 8
packets. We chose 8 because Octeon can fetch at most 8 packets in a single
PCI read, and our tests show that 8 results in low latency.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Oct 2017 15:23:59 +0000 (00:23 +0900)]
Merge branch 'ibmvnic-Tunable-parameter-support'
John Allen says:
====================
ibmvnic: Tunable parameter support
This series implements support for changing tunable parameters such as the
mtu, number of tx/rx queues, and number of buffers per queue via ethtool
and ifconfig.
v2: -Fix conflict with Tom's recently applied TSO/SG patches
v3: -Initialize rc in __ibmvnic_reset fixing build warning
-Fix buggy behavior with pending mac changes. Use boolean flag
to track if mac change is needed on open rather than relying on
the desired->mac pointer.
-Directly include tunable structs in the adapter struct rather
than keeping pointers, eliminating the need to directly allocate
them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Allen [Thu, 26 Oct 2017 21:24:15 +0000 (16:24 -0500)]
ibmvnic: Fix failover error path for non-fatal resets
For all non-fatal reset conditions, the hypervisor will send a failover when
we attempt to initialize the crq and the vnic client is expected to handle
that failover instead of the existing non-fatal reset. To handle this, we
need to return from init with a return code that indicates that we have hit
this case.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Allen [Thu, 26 Oct 2017 21:23:25 +0000 (16:23 -0500)]
ibmvnic: Update reset infrastructure to support tunable parameters
Update ibmvnic reset infrastructure to include a new reset option that will
allow changing of tunable parameters. There currently is no way to request
different capabilities from the vnic server on the fly so this patch
achieves this by resetting the driver and attempting to log in with the
requested changes. If the reset operation fails, the old values of the
tunable parameters are stored in the "fallback" struct and we attempt to
login with the fallback values.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Oct 2017 15:10:24 +0000 (00:10 +0900)]
Merge branch 'qualcomm-rmnet-Add-64-bit-stats-and-GRO'
Subash Abhinov Kasiviswanathan says:
====================
net: qualcomm: rmnet: Add 64 bit stats and GRO
This series adds support for 64 bit per cpu stats and GRO
Patches 1-2 are cleanups of return code and a redundant condition
Patch 3 adds support for 64 bit per cpu stats
Patch 4 adds support for GRO using GRO cells
v1->v2: Since gro_cells_init() could potentially fail, move it from device
setup to ndo_init() as mentioned by Eric.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Thu, 26 Oct 2017 17:06:49 +0000 (11:06 -0600)]
net: qualcomm: rmnet: Add support for GRO
Add gro_cells so that rmnet devices can call gro_cells_receive
instead of netif_receive_skb.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Thu, 26 Oct 2017 17:06:48 +0000 (11:06 -0600)]
net: qualcomm: rmnet: Add support for 64 bit stats
Implement 64 bit per cpu stats.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Thu, 26 Oct 2017 17:06:47 +0000 (11:06 -0600)]
net: qualcomm: rmnet: Always assign rmnet dev in deaggregation path
The rmnet device needs to assigned for all packets in the
deaggregation path based on the mux id, so the check is not needed.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Thu, 26 Oct 2017 17:06:46 +0000 (11:06 -0600)]
net: qualcomm: rmnet: Fix the return value of rmnet_rx_handler()
Since packet is always consumed by rmnet_rx_handler(), we always
return RX_HANDLER_CONSUMED. There is no need to pass on this
value through multiple functions.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 26 Oct 2017 16:11:48 +0000 (09:11 -0700)]
MAINAINTERS: Add Doug as GENET maintainer
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Oct 2017 15:02:46 +0000 (00:02 +0900)]
Merge branch 'bnxt_en-next'
Michael Chan says:
====================
bnxt_en: Updates for net-next.
This series includes firmware interface update, some optimizations,
some new PCI IDs, new MTU checks, ethtool reset method, interrupt coalescing
code cleanup, and TC flower offload for vxlan encap/decap from Sathya
Perla.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Thu, 26 Oct 2017 15:51:32 +0000 (11:51 -0400)]
bnxt_en: alloc tc_info{} struct only when tc flower is enabled
TC flower is not enabled on VFs and when there's no FW support.
Alloc the tc_info{} struct at init time only when TC flower is being
enabled.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Thu, 26 Oct 2017 15:51:31 +0000 (11:51 -0400)]
bnxt_en: query cfa flow stats periodically to compute 'lastused' attribute
This patch implements periodic querying of cfa flow stats
in batches to compute the 'lastused' attribute of TC flow stats.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Thu, 26 Oct 2017 15:51:30 +0000 (11:51 -0400)]
bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter
Add routines for issuing the hwrm_cfa_encap_record_alloc/free
and hwrm_cfa_decap_filter_alloc/free FW cmds needed for
supporting vxlan encap/decap offload.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Thu, 26 Oct 2017 15:51:29 +0000 (11:51 -0400)]
bnxt_en: add support for Flower based vxlan encap/decap offload
This patch adds IPv4 vxlan encap/decap action support to TC-flower
offload.
For vxlan encap, the driver maintains a tunnel encap hash-table.
When a new flow with a tunnel encap action arrives, this table
is looked up; if an encap entry exists, it uses the already
programmed encap_record_handle as the tunnel_handle in the
hwrm_cfa_flow_alloc cmd. Else, a new encap node is added and the
L2 header fields are queried via a route lookup.
hwrm_cfa_encap_record_alloc cmd is used to create a new encap
record and the encap_record_handle is used as the tunnel_handle
while adding the flow.
For vxlan decap, the driver maintains a tunnel decap hash-table.
When a new flow with a tunnel decap action arrives, this table
is looked up; if a decap entry exists, it uses the already
programmed decap_filter_handle as the tunnel_handle in the
hwrm_cfa_flow_alloc cmd. Else, a new decap node is added and
a decap_filter_handle is alloc'd via the hwrm_cfa_decap_filter_alloc
cmd. This handle is used as the tunnel_handle while adding the flow.
The code to issue the HWRM FW cmds is introduced in a follow-up patch.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 26 Oct 2017 15:51:28 +0000 (11:51 -0400)]
bnxt_en: Refactor and simplify coalescing code.
The mapping of the ethtool coalescing parameters to hardware parameters
is now done in bnxt_hwrm_set_coal_params(). The same function can
handle both RX and TX settings. The code is now more clear. Some
adjustments have been made to get better hardware settings. The
coal_frames setting is now accurately set in hardware. The max_timer
is set to coal_ticks value.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 26 Oct 2017 15:51:27 +0000 (11:51 -0400)]
bnxt_en: Reorganize the coalescing parameters.
The current IRQ coalescing logic is a little messy. The ethtool
parameters are mapped to hardware parameters in a way that is difficult
to understand. The first step is to better organize the parameters
by adding the new structure bnxt_coal. The structure is used by both
the RX and TX sets of coalescing parameters.
Adjust the default coal_ticks to 14 us and 28 us for RX and TX.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Thu, 26 Oct 2017 15:51:26 +0000 (11:51 -0400)]
bnxt_en: Add ethtool reset method
This is a firmware internal reset after driver is unloaded.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 26 Oct 2017 15:51:25 +0000 (11:51 -0400)]
bnxt_en: Check maximum supported MTU from firmware.
Some NICs have a firmware enforced maximum MTU setting by management
firmware. Set up netdev->max_mtu accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 26 Oct 2017 15:51:24 +0000 (11:51 -0400)]
bnxt_en: Optimize .ndo_set_mac_address() for VFs.
No need to call bnxt_approve_mac() which will send a message to the
PF if the MAC address hasn't changed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 26 Oct 2017 15:51:23 +0000 (11:51 -0400)]
bnxt_en: Get firmware package version one time.
The current code retrieves the firmware package version from firmware
everytime ethtool -i is run. There is no reason to do that as the
firmware will not change while the driver is loaded. Get the version
once at init time.
Also, display the full 4-part firmware version string and remove the
less useful interface spec version.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 26 Oct 2017 15:51:22 +0000 (11:51 -0400)]
bnxt_en: Check for zero length value in bnxt_get_nvram_item().
Return -EINVAL if the length is zero and not proceed to do essentially
nothing.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Miller [Thu, 26 Oct 2017 15:51:21 +0000 (11:51 -0400)]
bnxt_en: adding PCI ID for SMARTNIC VF support
Signed-off-by: Rob Miller <rmiller@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ray Jui [Thu, 26 Oct 2017 15:51:20 +0000 (11:51 -0400)]
bnxt_en: Add PCIe device ID for bcm58804
Add new PCIe device ID and chip number for bcm58804
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 26 Oct 2017 15:51:19 +0000 (11:51 -0400)]
bnxt_en: Update firmware interface to 1.8.3.1
Vxlan encap/decap filters are added to this firmware spec.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Oct 2017 15:00:10 +0000 (00:00 +0900)]
Merge branch 'dsa-define-port-types'
Vivien Didelot says:
====================
net: dsa: define port types
The DSA code currently has 3 bitmaps in the dsa_switch structure:
cpu_port_mask, dsa_port_mask and enabled_port_mask.
They are used to store the type of each switch port. This dates back
from when DSA didn't have a dsa_port structure to hold port-specific
data.
The dsa_switch structure is mainly used to communicate with DSA drivers
and must not contain such static data parsed from DTS or pdata, which
belongs the DSA core structures, such as dsa_switch_tree and dsa_port.
Also the enabled_port_mask is misleading, often misinterpreted as the
complement of disabled ports (thus including DSA and CPU ports), while
in fact it only masks the user ports.
A port can be of 3 types when it is not unused: "cpu" (interfacing with
a master device), "dsa" (interconnecting with another "dsa" port from
another switch chip), or "user" (user-facing port.)
This patchset first fixes the usage of DSA port type helpers, then
defines the DSA_PORT_TYPE_UNUSED, DSA_PORT_TYPE_CPU, DSA_PORT_TYPE_DSA,
and DSA_PORT_TYPE_USER port types, and finally removes the misleading
port bitmaps.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:59 +0000 (11:22 -0400)]
net: dsa: remove port masks
Now that DSA core provides port types, there is no need to keep this
information at the switch level. This is a static information that is
part of a DSA core dsa_port structure. Remove them.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:58 +0000 (11:22 -0400)]
net: dsa: use new port type in helpers
Now that DSA exposes an enumerated type for the ports, we can use them
directly instead of checking bitmaps, which is more consistent.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:57 +0000 (11:22 -0400)]
net: dsa: define port types
Introduce an enumerated type for ports, which will be way more explicit
to identify a port type instead of digging into switch port masks.
A port can be of type CPU, DSA, user, or unused by default. This is a
static parsed information that cannot be changed at runtime.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:56 +0000 (11:22 -0400)]
net: dsa: introduce dsa_user_ports helper
Introduce a dsa_user_ports() helper to return the ds->enabled_port_mask
mask which is more explicit. This will also minimize diffs when touching
this internal mask.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:55 +0000 (11:22 -0400)]
net: dsa: use dsa_is_user_port everywhere
Most of the DSA code still check ds->enabled_port_mask directly to
inspect a given port type instead of using the provided dsa_is_user_port
helper. Change this.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:54 +0000 (11:22 -0400)]
net: dsa: rename dsa_is_normal_port helper
This patch renames dsa_is_normal_port to dsa_is_user_port because "user"
is the correct term in the DSA terminology, not "normal".
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:53 +0000 (11:22 -0400)]
net: dsa: fix dsa_is_normal_port helper
In order to know if a port is of type user, dsa_is_normal_port checks
that the given port is not of type DSA nor CPU. This is not enough
because a port can be unused.
Without the previous fix, this caused the unused mv88e6xxx ports to be
configured in normal mode.
The ds->enabled_port_mask reports the user ports, so check this instead.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:52 +0000 (11:22 -0400)]
net: dsa: mv88e6xxx: skip unused ports
The unused ports are currently configured in normal mode. This does not
prevent the switch from being functional, but it is unnecessary. Skip
unused ports.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 26 Oct 2017 15:22:51 +0000 (11:22 -0400)]
net: dsa: add dsa_is_unused_port helper
As the comment above the chunk states, the b53 driver attempts to
disable the unused ports. But using ds->enabled_port_mask is misleading,
because this mask reports in fact the user ports.
To avoid confusion and fix this, this patch introduces an explicit
dsa_is_unused_port helper which ensures the corresponding bit is not
masked in any of the switch port masks.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 26 Oct 2017 12:27:45 +0000 (07:27 -0500)]
net: faraday: ftmac100: Use BUG_ON instead of if condition followed by BUG.
Notice that in this particular case unlikely() is already being called
inside BUG_ON macro.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 26 Oct 2017 12:16:01 +0000 (07:16 -0500)]
net: bcmgenet: Use BUG_ON instead of if condition followed by BUG
Use BUG_ON instead of if condition followed by BUG.
Something to notice in this particular case is that unlikely()
is already being called inside BUG_ON macro.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Oct 2017 14:48:30 +0000 (23:48 +0900)]
Merge branch 'cxgb4-collect-more-hardware-dumps-via-ethtool'
Rahul Lakkireddy says:
====================
cxgb4: collect more hardware dumps via ethtool
This series of patches collect more firmware and hardware dumps
via ethool --get-dump facility.
Patch 1 collects hardware logic analyzer dumps.
Patch 2 collects CIM queue configuration dump.
Patch 3 collects RSS dumps.
Patch 4 collects TID info dump.
Patch 5 collects MPS-TCAM dump.
Patch 6 collects PBT tables dump.
Patch 7 collects hardware scheduler and pace table dumps.
Patch 8 collects miscellaneous hardware information, including
path mtu, PM stats, TP clock info, congestion control, and VPD
data dumps.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Thu, 26 Oct 2017 11:48:40 +0000 (17:18 +0530)]
cxgb4: collect hardware misc dumps
Collect path mtu, PM stats, TP clock info, congestion control, and VPD
data dumps.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Thu, 26 Oct 2017 11:48:39 +0000 (17:18 +0530)]
cxgb4: collect hardware scheduler dumps
Collect hardware TX traffic scheduler and pace tables.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Thu, 26 Oct 2017 11:48:38 +0000 (17:18 +0530)]
cxgb4: collect PBT tables dump
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Thu, 26 Oct 2017 11:48:37 +0000 (17:18 +0530)]
cxgb4: collect MPS-TCAM dump
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Thu, 26 Oct 2017 11:48:36 +0000 (17:18 +0530)]
cxgb4: collect TID info dump
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Thu, 26 Oct 2017 11:48:35 +0000 (17:18 +0530)]
cxgb4: collect RSS dumps
Collect RSS table and RSS VF configuration dumps.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Thu, 26 Oct 2017 11:48:34 +0000 (17:18 +0530)]
cxgb4: collect CIM queue configuration dump
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Thu, 26 Oct 2017 11:48:33 +0000 (17:18 +0530)]
cxgb4: collect hardware LA dumps
Collect CIM, CIM_MA, ULP_RX, TP, CIM_PIF, and ULP_TX logic analyzer
dumps.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>