Daniel Borkmann [Mon, 28 Nov 2016 22:16:54 +0000 (23:16 +0100)]
bpf, xdp: allow to pass flags to dev_change_xdp_fd
Add an IFLA_XDP_FLAGS attribute that can be passed for setting up
XDP along with IFLA_XDP_FD, which eventually allows user space to
implement typical add/replace/delete logic for programs. Right now,
calling into dev_change_xdp_fd() will always replace previous programs.
When passed XDP_FLAGS_UPDATE_IF_NOEXIST, we can handle this more
graceful when requested by returning -EBUSY in case we try to
attach a new program, but we find that another one is already
attached. This will be used by upcoming front-end for iproute2 as
well.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Nov 2016 15:22:28 +0000 (10:22 -0500)]
Merge branch 'broadcom-phy-internal-counters'
Florian Fainelli says:
====================
net: phy: broadcom: Support for PHY counters
This patch series adds support for reading the Broadcom PHYs internal counters.
Changes in v3:
- fixed the allocation of priv->stats in bcm7xxx
Changes in v2:
- fixed modular build reported by kbuild
- constify array of stats
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Nov 2016 17:57:18 +0000 (09:57 -0800)]
net: phy: bcm7xxx: Plug in support for reading PHY error counters
Broadcom BCM7xxx internal PHYs can leverage the Broadcom PHY library
module PHY error counters helper functions, just implement the
appropriate PHY driver function calls to do so. We need to allocate some
storage space for our PHY statistics, and provide it to the Broadcom PHY
library, so do this in a specific probe function, and slightly wrap the
get_stats function call.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Nov 2016 17:57:17 +0000 (09:57 -0800)]
net: phy: broadcom: Add support code for reading PHY counters
Broadcom PHYs expose a number of PHY error counters: receive errors,
false carrier sense, SerDes BER count, local and remote receive errors.
Add support code to allow retrieving these error counters. Since the
Broadcom PHY library code is used by several drivers, make it possible
for them to specify the storage for the software copy of the statistics.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 28 Nov 2016 18:55:34 +0000 (18:55 +0000)]
sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver
Rationale: The differences between Falcon and Siena are in many ways larger
than those between Siena and EF10 (despite Siena being nominally "Falcon-
architecture"); for instance, Falcon has no MCPU, so there is no MCDI.
Removing Falcon support from the sfc driver should simplify the latter,
and avoid the possibility of Falcon support being broken by changes to sfc
(which are rarely if ever tested on Falcon, it being end-of-lifed hardware).
The sfc-falcon driver created in this changeset is essentially a copy of the
sfc driver, but with Siena- and EF10-specific code, including MCDI, removed
and with the "efx_" identifier prefix changed to "ef4_" (for "EFX 4000-
series") to avoid collisions when both drivers are built-in.
This changeset removes Falcon from the sfc driver's PCI ID table; then in
sfc I've removed obvious Falcon-related code: I removed the Falcon NIC
functions, Falcon PHY code, and EFX_REV_FALCON_*, then fixed up everything
that referenced them.
Also, increment minor version of both drivers (to 4.1).
For now, CONFIG_SFC selects CONFIG_SFC_FALCON, so that updating old configs
doesn't cause Falcon support to disappear; but that should be undone at
some point in the future.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yegor Yefremov [Mon, 28 Nov 2016 09:47:52 +0000 (10:47 +0100)]
cpsw: ethtool: add support for nway reset
This patch adds support for ethtool's '-r' command. Restarting
N-WAY negotiation can be useful to activate newly changed EEE
settings etc.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Nov 2016 15:04:32 +0000 (10:04 -0500)]
Merge branch 'tcp-sender-chronographs'
Yuchung Cheng says:
====================
tcp: sender chronographs instrumentation
This patch set provides instrumentation on TCP sender limitations.
While developing the BBR congestion control, we noticed that TCP
sending process is often limited by factors unrelated to congestion
control: insufficient sender buffer and/or insufficient receive
window/buffer to saturate the network bandwidth. Unfortunately these
limits are not visible to the users and often the poor performance
is attributed to the congestion control of choice.
Thie patch aims to help users get the high level understanding of
where sending process is limited by, similar to the TCP_INFO design.
It is not to replace detailed kernel tracing and instrumentation
facilities.
In addition this patch set provide a new option to the timestamping
work to instrument these limits on application data unit. For exampe,
one can use SO_TIMESTAMPING and this patch set to measure the how
long a particular HTTP response is limited by small receive window.
Patch set was initially written by Francis Yan then polished
by Yuchung Cheng, with lots of help from Eric Dumazet and Soheil
Hassas Yeganeh.
====================
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francis Yan [Mon, 28 Nov 2016 07:07:18 +0000 (23:07 -0800)]
tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation. For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.
To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francis Yan [Mon, 28 Nov 2016 07:07:17 +0000 (23:07 -0800)]
tcp: export sender limits chronographs to TCP_INFO
This patch exports all the sender chronograph measurements collected
in the previous patches to TCP_INFO interface. Note that busy time
exported includes all the other sending limits (rwnd-limited,
sndbuf-limited). Internally the time unit is jiffy but externally
the measurements are in microseconds for future extensions.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francis Yan [Mon, 28 Nov 2016 07:07:16 +0000 (23:07 -0800)]
tcp: instrument how long TCP is limited by insufficient send buffer
This patch measures the amount of time when TCP runs out of new data
to send to the network due to insufficient send buffer, while TCP
is still busy delivering (i.e. write queue is not empty). The goal
is to indicate either the send buffer autotuning or user SO_SNDBUF
setting has resulted network under-utilization.
The measurement starts conservatively by checking various conditions
to minimize false claims (i.e. under-estimation is more likely).
The measurement stops when the SOCK_NOSPACE flag is cleared. But it
does not account the time elapsed till the next application write.
Also the measurement only starts if the sender is still busy sending
data, s.t. the limit accounted is part of the total busy time.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francis Yan [Mon, 28 Nov 2016 07:07:15 +0000 (23:07 -0800)]
tcp: instrument how long TCP is limited by receive window
This patch measures the total time when the TCP stops sending because
the receiver's advertised window is not large enough. Note that
once the limit is lifted we are likely in the busy status if we
have data pending.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francis Yan [Mon, 28 Nov 2016 07:07:14 +0000 (23:07 -0800)]
tcp: instrument how long TCP is busy sending
This patch measures TCP busy time, which is defined as the period
of time when sender has data (or FIN) to send. The time starts when
data is buffered and stops when the write queue is flushed by ACKs
or error events.
Note the busy time does not include SYN time, unless data is
included in SYN (i.e. Fast Open). It does include FIN time even
if the FIN carries no payload. Excluding pure FIN is possible but
would incur one additional test in the fast path, which may not
be worth it.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francis Yan [Mon, 28 Nov 2016 07:07:13 +0000 (23:07 -0800)]
tcp: instrument tcp sender limits chronographs
This patch implements the skeleton of the TCP chronograph
instrumentation on sender side limits:
1) idle (unspec)
2) busy sending data other than 3-4 below
3) rwnd-limited
4) sndbuf-limited
The limits are enumerated 'tcp_chrono'. Since a connection in
theory can idle forever, we do not track the actual length of this
uninteresting idle period. For the rest we track how long the sender
spends in each limit. At any point during the life time of a
connection, the sender must be in one of the four states.
If there are multiple conditions worthy of tracking in a chronograph
then the highest priority enum takes precedence over
the other conditions. So that if something "more interesting"
starts happening, stop the previous chrono and start a new one.
The time unit is jiffy(u32) in order to save space in tcp_sock.
This implies application must sample the stats no longer than every
49 days of 1ms jiffy.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yegor Yefremov [Mon, 28 Nov 2016 08:41:33 +0000 (09:41 +0100)]
cpsw: ethtool: add support for getting/setting EEE registers
Add the ability to query and set Energy Efficient Ethernet parameters
via ethtool for applicable devices.
This patch doesn't activate full EEE support in cpsw driver, but it
enables reading and writing EEE advertising settings. This way one
can disable advertising EEE for certain speeds.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Acked-by: Rami Rosen <roszenrami@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaly Kuznetsov [Mon, 28 Nov 2016 17:25:44 +0000 (18:25 +0100)]
hv_netvsc: remove excessive logging on MTU change
When we change MTU or the number of channels on a netvsc device we get the
following logged:
hv_netvsc
bf5edba8...: net device safe to remove
hv_netvsc: hv_netvsc channel opened successfully
hv_netvsc
bf5edba8...: Send section size: 6144, Section count:2560
hv_netvsc
bf5edba8...: Device MAC 00:15:5d:1e:91:12 link state up
This information is useful as debug at most.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Nov 2016 01:48:52 +0000 (20:48 -0500)]
Merge branch 'mlxsw-enhancements'
Jiri Pirko says:
====================
mlxsw: couple of enhancements and fixes
Couple of enhancements and fixes from Ido.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 28 Nov 2016 17:01:26 +0000 (18:01 +0100)]
mlxsw: core: Change order of operations in removal path
We call bus->init() before allocating 'lag.mapping'. Change the order of
operations in removal path to reflect that.
This makes the error path of mlxsw_core_bus_device_register() symmetric
with mlxsw_core_bus_device_unregister().
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 28 Nov 2016 17:01:25 +0000 (18:01 +0100)]
mlxsw: core: Add missing rollback in error path
Without this rollback, the thermal zone is still registered during the
error path, whereas its private data is freed upon the destruction of
the underlying bus device due to the use of devm_kzalloc(). This results
in use after free.
Fix this by calling mlxsw_thermal_fini() from the appropriate place in
the error path.
Fixes:
a50c1e35650b ("mlxsw: core: Implement thermal zone")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 28 Nov 2016 17:01:24 +0000 (18:01 +0100)]
mlxsw: spectrum_buffers: Limit size of pools
The shared buffer pools are containers whose size is used to calculate
the maximum usage for packets from / to a specific port / {port, PG/TC},
when dynamic threshold is employed.
While it's perfectly fine for the sum of the pools to exceed the maximum
size of the shared buffer, a single pool cannot.
Add a check when the pool size is set and forbid sizes larger than the
maximum size of the shared buffer.
Without the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size
999999999 thtype
dynamic
// No error is returned
With the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size
999999999 thtype
dynamic
devlink answers: Invalid argument
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 28 Nov 2016 17:01:23 +0000 (18:01 +0100)]
mlxsw: resources: Add maximum buffer size
We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 28 Nov 2016 14:26:04 +0000 (15:26 +0100)]
mlxsw: switchib: add MLXSW_PCI dependency
The newly added switchib driver fails to link if MLXSW_PCI=m:
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit':
switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister'
switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister'
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init':
switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister'
The other two such sub-drivers have a dependency, so add the same one
here. In theory we could allow this driver if MLXSW_PCI is disabled,
but it's probably not worth it.
Fixes:
d1ba52638456 ("mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Mon, 28 Nov 2016 14:18:31 +0000 (15:18 +0100)]
net: phy: Fix use after free in phy_detach()
If device_release_driver(&phydev->mdio.dev) is called, it releases all
resources belonging to the PHY device. Hence the subsequent call to
phy_led_triggers_unregister() will access already freed memory when
unregistering the LEDs.
Move the call to phy_led_triggers_unregister() before the possible call
to device_release_driver() to fix this.
Fixes:
2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy link state change")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Machek [Mon, 28 Nov 2016 11:55:59 +0000 (12:55 +0100)]
stmmac: fix comments, make debug output consistent
Fix comments, add some new, and make debugfs output consistent.
Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Mon, 28 Nov 2016 13:11:04 +0000 (14:11 +0100)]
bpf: cgroup: fix documentation of __cgroup_bpf_update()
There's a 'not' missing in one paragraph. Add it.
Fixes:
3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reported-by: Rami Rosen <roszenrami@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Nov 2016 00:38:31 +0000 (19:38 -0500)]
Merge branch 'eee-broken-modes'
Jerome Brunet says:
====================
Fix OdroidC2 Gigabit Tx link issue
This patchset fixes an issue with the OdroidC2 board (DWMAC + RTL8211F).
The platform seems to enter LPI on the Rx path too often while performing
relatively high TX transfer. This eventually break the link (both Tx and
Rx), and require to bring the interface down and up again to get the Rx
path working again.
The root cause of this issue is not fully understood yet but disabling EEE
advertisement on the PHY prevent this feature to be negotiated.
With this change, the link is stable and reliable, with the expected
throughput performance.
The patchset adds options in the generic phy driver to disable EEE
advertisement, through device tree. The way it is done is very similar
to the handling of the max-speed property.
Changes since V2: [2]
- Rename "eee-advert-disable" to "eee-broken-modes" to make the intended
purpose of this option clear (flag broken configuration, not a
configuration option)
- Add DT bindings constants so the DT configuration is more user friendly
- Submit to net-next instead of net.
Changes since V1: [1]
- Disable the advertisement of EEE in the generic code instead of the
realtek driver.
[1] : http://lkml.kernel.org/r/
1479220154-25851-1-git-send-email-jbrunet@baylibre.com
[2] : http://lkml.kernel.org/r/
1479742524-30222-1-git-send-email-jbrunet@baylibre.com
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
jbrunet [Mon, 28 Nov 2016 09:46:48 +0000 (10:46 +0100)]
dt: bindings: add ethernet phy eee-broken-modes option documentation
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
jbrunet [Mon, 28 Nov 2016 09:46:47 +0000 (10:46 +0100)]
dt-bindings: net: add EEE capability constants
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
jbrunet [Mon, 28 Nov 2016 09:46:46 +0000 (10:46 +0100)]
net: phy: add an option to disable EEE advertisement
This patch adds an option to disable EEE advertisement in the generic PHY
by providing a mask of prohibited modes corresponding to the value found in
the MDIO_AN_EEE_ADV register.
On some platforms, PHY Low power idle seems to be causing issues, even
breaking the link some cases. The patch provides a convenient way for these
platforms to disable EEE advertisement and work around the issue.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Thu, 24 Nov 2016 14:36:33 +0000 (15:36 +0100)]
net: stmmac: enable tx queue 0 for gmac4 IPs synthesized with multiple TX queues
The dwmac4 IP can synthesized with 1-8 number of tx queues.
On an IP synthesized with DWC_EQOS_NUM_TXQ > 1, all txqueues are disabled
by default. For these IPs, the bitfield TXQEN is R/W.
Always enable tx queue 0. The write will have no effect on IPs synthesized
with DWC_EQOS_NUM_TXQ == 1.
The driver does still not utilize more than one tx queue in the IP.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Robinson [Mon, 28 Nov 2016 07:12:37 +0000 (07:12 +0000)]
net: arc_emac: add dependencies on associated arches and compile test
Add dependencies on the architectures that support these devices and
add compile test to ensure ongoing code build coverage.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Färber [Sun, 27 Nov 2016 22:59:51 +0000 (23:59 +0100)]
MAINTAINERS: Add device tree bindings to mv88e6xx section
Also include the netdev list for convenience, as done elsewhere.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Nov 2016 15:46:20 +0000 (07:46 -0800)]
mlx4: give precise rx/tx bytes/packets counters
mlx4 stats are chaotic because a deferred work queue is responsible
to update them every 250 ms.
Even sampling stats every one second with "sar -n DEV 1" gives
variations like the following :
lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:39:22 eth0 146877.00 3265554.00 9467.15 4828168.50
07:39:23 eth0 146587.00 3260329.00 9448.15 4820445.98
07:39:24 eth0 146894.00 3259989.00 9468.55 4819943.26
07:39:25 eth0 110368.00 2454497.00 7113.95 3629012.17 <<>>
07:39:26 eth0 146563.00 3257502.00 9447.25 4816266.23
07:39:27 eth0 145678.00 3258292.00 9389.79 4817414.39
07:39:28 eth0 145268.00 3253171.00 9363.85 4809852.46
07:39:29 eth0 146439.00 3262185.00 9438.97 4823172.48
07:39:30 eth0 146758.00 3264175.00 9459.94 4826124.13
07:39:31 eth0 146843.00 3256903.00 9465.44 4815381.97
Average: eth0 142827.50 3179259.70 9206.30 4700578.16
This patch allows rx/tx bytes/packets counters being folded at the
time we need stats.
We now can fetch stats every 1 ms if we want to check NIC behavior
on a small time window. It is also easier to detect anomalies.
lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:42:50 eth0 142915.00 3177696.00 9212.06 4698270.42
07:42:51 eth0 143741.00 3200232.00 9265.15 4731593.02
07:42:52 eth0 142781.00 3171600.00 9202.92 4689260.16
07:42:53 eth0 143835.00 3192932.00 9271.80 4720761.39
07:42:54 eth0 141922.00 3165174.00 9147.64 4679759.21
07:42:55 eth0 142993.00 3207038.00 9216.78 4741653.05
07:42:56 eth0 141394.06 3154335.64 9113.85 4663731.73
07:42:57 eth0 141850.00 3161202.00 9144.48 4673866.07
07:42:58 eth0 143439.00 3180736.00 9246.05 4702755.35
07:42:59 eth0 143501.00 3210992.00 9249.99 4747501.84
Average: eth0 142835.66 3182165.93 9206.98 4704874.08
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Mon, 28 Nov 2016 05:26:58 +0000 (13:26 +0800)]
geneve: fix ip_hdr_len reserved for geneve6 tunnel.
It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel.
Fixes:
c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()')
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 21:07:43 +0000 (16:07 -0500)]
Merge branch 'phy-doc-improvements'
Florian Fainelli says:
====================
Documentation: net: phy: Improve documentation
This patch series addresses discussions and feedback that was recently received
on the mailing-list in the area of: flow control/pause frames, interpretation of
phy_interface_t and finally add some links to useful standards documents.
Changes in v3:
- add Timur's feedback into patch 3
Changes in v2:
- clarify a few things in the RGMII section, add a paragraph about common issues
with RGMII delay mismatches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 28 Nov 2016 02:45:15 +0000 (18:45 -0800)]
Documentation: net: phy: Add links to several standards documents
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 28 Nov 2016 02:45:14 +0000 (18:45 -0800)]
Documentation: net: phy: Add blurb about RGMII
RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 28 Nov 2016 02:45:13 +0000 (18:45 -0800)]
Documentation: net: phy: Add a paragraph about pause frames/flow control
Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 28 Nov 2016 02:45:12 +0000 (18:45 -0800)]
Documentation: net: phy: remove description of function pointers
Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Färber [Sun, 27 Nov 2016 22:26:28 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
so free the same amount. This will be 8 or 9 in practice, less than 16.
Fixes:
dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 20:09:37 +0000 (15:09 -0500)]
Merge branch 'mlx5-DCBX-and-ethtool-updates'
Saeed Mahameed says:
====================
Mellanox 100G mlx5 DCBX and ethtool updates
This series provides the following mlx5 updates:
From Huy:
DCBX CEE API and DCBX firmware/host modes support.
- 1st patch ensures the dcbnl_rtnl_ops is published only when the qos
capability bits is on.
- 2nd patch adds the support for CEE interfaces into mlx5 dcbnl_rtnl_ops
- 3rd patch refactors ETS query to read ETS configuration directly from
firmware rather than having a software shadow to it. The existing IEEE
interfaces stays the same.
- 4th patch adds the support for MLX5_REG_DCBX_PARAM and MLX5_REG_DCBX_APP
firmware commands to manipulate mlx5 DCBX mode.
- 5th patch adds the driver support for the new DCBX firmware. This ensures
the backward compatibility versus the old and new firmware. With the new DCBX
firmware, qos settings can be controlled by either firmware or software
depending on the DCBX mode.
From Kamal and Saeed:
- mlx5 self-test support.
From Shaker:
- Private flag to give the user the ability to enable/disable mlx5 CQE
compression.
V1->V2:
- Check ETS capability where needed in:
("net/mlx5e: Read ETS settings directly from firmware")
- Fix return value of mlx5e_dcbnl_switch_to_host_mode in:
("net/mlx5e: ConnectX-4 firmware support for DCBX")
- Update commit message of:
("net/mlx5e: ConnectX-4 firmware support for DCBX")
- Fix two sparse static check warnings in en_selftest.c
This series was generated against commit:
e5f12b3f5ebb ("Merge branch 'mlxsw-trap-groups-and-policers'")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shaker Daibes [Sun, 27 Nov 2016 15:02:12 +0000 (17:02 +0200)]
net/mlx5e: Add CQE compression user control
The user can now override the automatic driver decision using the
rx_cqe_compress flag, which is the preference for CQE compression.
The flag is initialized with the automatic driver decision.
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shaker Daibes [Sun, 27 Nov 2016 15:02:11 +0000 (17:02 +0200)]
net/mlx5e: Moves pflags to priv->params
pflags is a configuration parameter for the netdev, naturally it belongs
to priv->params.
Also introduce MLX5E_GET_PFLAG
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Sun, 27 Nov 2016 15:02:10 +0000 (17:02 +0200)]
net/mlx5e: Add support for loopback selftest
Extend the self diagnostic tests to support loopback test.
The loopback test doesn't require the offline flag, it will use the
generic dev_queue_xmit and a dedicated packet_type to capture and verify
mlx5e selftest loopback packets.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kamal Heib [Sun, 27 Nov 2016 15:02:09 +0000 (17:02 +0200)]
net/mlx5e: Add support for ethtool self diagnostics test
The self diagnostics test implementaion include the following features:
1. Link Test: Check that link is in up state.
2. Speed Test: Check that link was negotiated correctly.
3. Health Test: Check the device health.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:08 +0000 (17:02 +0200)]
net/mlx5e: Add DCBX control interface
Use setdcbx interface to set the DCBX mode to firmware or os.
If setdcbx is called with mode value of zero, the DCBX mode
is set to firmware.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:07 +0000 (17:02 +0200)]
net/mlx5e: ConnectX-4 firmware support for DCBX
DBCX by default is controlled by firmware where dcbx capability bit
is set. In this mode, firmware is responsible for reading/sending the
TLV packets from/to the remote partner.
This patch sets up the infrastructure to move between HOST/FW DCBX
control mode.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:06 +0000 (17:02 +0200)]
net/mlx5: Add DCBX firmware commands support
Add set/query commands for DCBX_PARAM register
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:05 +0000 (17:02 +0200)]
net/mlx5e: Read ETS settings directly from firmware
Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.
With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.
Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
creation.
b. When reading ETS setting from FW, if the traffic class bandwidth
is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
implementation solves the scenarios when the DCBX is in FW control
and willing bit is on which means the ETS setting is dictated
by remote switch.
Also check ETS capability where needed.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:04 +0000 (17:02 +0200)]
net/mlx5e: Support DCBX CEE API
Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.
Note:
priority group in CEE is equivalent to traffic class in ConnectX-4
hardware spec.
bw allocation per priority in CEE is not supported because ConnectX-4
only supports bw allocation per traffic class.
user priority in CEE does not have an equivalent term in ConnectX-4.
Therefore, user priority to priority mapping in CEE is not supported.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:03 +0000 (17:02 +0200)]
net/mlx5e: Add qos capability check
Make sure firmware supports qos before exposing the DCB API.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 25 Nov 2016 04:37:26 +0000 (12:37 +0800)]
virtio-net: enable multiqueue by default
We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:
- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
cases
So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Jeremy Eder <jeder@redhat.com>
Cc: Marko Myllynen <myllynen@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 16:58:57 +0000 (11:58 -0500)]
Merge branch 'MV88E6097-fixes'
Stefan Eichenberger says:
====================
Fix support for the MV88E6097
This patchset fixes the following two issues for the MV88E6097:
- Add missing definition of g1_irqs
- Add missing comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Eichenberger [Fri, 25 Nov 2016 08:41:30 +0000 (09:41 +0100)]
net: dsa: mv88e6xxx: add missing comment for MV88E6097
Add a missing comment for the MV88E6097 because of unification.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Eichenberger [Fri, 25 Nov 2016 08:41:29 +0000 (09:41 +0100)]
net: dsa: mv88e6xxx: add g1_irqs definition for MV88E6097
Add the missing definition of g1_irqs for MV88E6097.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 01:38:49 +0000 (20:38 -0500)]
Merge branch 'bpf-misc-next'
Daniel Borkmann says:
====================
BPF cleanups and misc updates
This patch set adds couple of cleanups in first few patches,
exposes owner_prog_type for array maps as well as mlocked mem
for maps in fdinfo, allows for mount permissions in fs and
fixes various outstanding issues in selftests and samples.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:09 +0000 (01:28 +0100)]
bpf: fix multiple issues in selftest suite and samples
1) The test_lru_map and test_lru_dist fails building on my machine since
the sys/resource.h header is not included.
2) test_verifier fails in one test case where we try to call an invalid
function, since the verifier log output changed wrt printing function
names.
3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
retrieving the number of possible CPUs. This is broken at least in our
scenario and really just doesn't work.
glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
if that fails, depending on the config, it either tries to count CPUs
in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
If /proc/cpuinfo has some issue, it returns just 1 worst case. This
oddity is nothing new [1], but semantics/behaviour seems to be settled.
_SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
that fails it looks into /proc/stat for cpuX entries, and if also that
fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
unlikely all breaks down).
While that might match num_possible_cpus() from the kernel in some
cases, it's really not guaranteed with CPU hotplugging, and can result
in a buffer overflow since the array in user space could have too few
number of slots, and on perpcu map lookup, the kernel will write beyond
that memory of the value buffer.
William Tu reported such mismatches:
[...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
happens when CPU hotadd is enabled. For example, in Fusion when
setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
-smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
will be 2 [2]. [...]
Documentation/cputopology.txt says /sys/devices/system/cpu/possible
outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
own implementation. Later, we could add support to bpf(2) for passing
a mask via CPU_SET(3), for example, to just select a subset of CPUs.
BPF samples code needs this fix as well (at least so that people stop
copying this). Thus, define bpf_num_possible_cpus() once in selftests
and import it from there for the sample code to avoid duplicating it.
The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.
After all three issues are fixed, the test suite runs fine again:
# make run_tests | grep self
selftests: test_verifier [PASS]
selftests: test_maps [PASS]
selftests: test_lru_map [PASS]
selftests: test_kmod.sh [PASS]
[1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
[2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html
Fixes:
3059303f59cf ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes:
86af8b4191d2 ("Add sample for adding simple drop program to link")
Fixes:
df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes:
e15596717948 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes:
ebb676daa1a3 ("bpf: Print function name in addition to function id")
Fixes:
5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:08 +0000 (01:28 +0100)]
bpf: allow for mount options to specify permissions
Since we recently converted the BPF filesystem over to use mount_nodev(),
we now have the possibility to also hold mount options in sb's s_fs_info.
This work implements mount options support for specifying permissions on
the sb's inode, which will be used by tc when it manually needs to mount
the fs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:07 +0000 (01:28 +0100)]
bpf: add owner_prog_type and accounted mem to array map's fdinfo
Allow for checking the owner_prog_type of a program array map. In some
cases bpf(2) can return -EINVAL /after/ the verifier passed and did all
the rewrites of the bpf program.
The reason that lets us fail at this late stage is that program array
maps are incompatible. Allow users to inspect this earlier after they
got the map fd through BPF_OBJ_GET command. tc will get support for this.
Also, display how much we charged the map with regards to RLIMIT_MEMLOCK.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:06 +0000 (01:28 +0100)]
bpf: reuse dev_is_mac_header_xmit for redirect
Commit
dcf800344a91 ("net/sched: act_mirred: Refactor detection whether
dev needs xmit at mac header") added dev_is_mac_header_xmit(); since it's
also useful elsewhere, move it to if_arp.h and reuse it for BPF.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:05 +0000 (01:28 +0100)]
bpf: drop useless bpf_fd member from cls/act
After setup we don't need to keep user space fd number around anymore, as
it also has no useful meaning for anyone, just remove it.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:04 +0000 (01:28 +0100)]
bpf: drop unnecessary context cast from BPF_PROG_RUN
Since long already bpf_func is not only about struct sk_buff * as
input anymore. Make it generic as void *, so that callers don't
need to cast for it each time they call BPF_PROG_RUN().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 25 Nov 2016 10:43:04 +0000 (13:43 +0300)]
sfc: remove unneeded variable
We don't use ->heap_buf after commit
46d1efd852cc ("sfc: remove Software
TSO") so let's remove the last traces.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 01:26:59 +0000 (20:26 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2016-11-25' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.10
Major changes:
iwlwifi
* finalize and enable dynamic queue allocation
* use dev_coredumpmsg() to prevent locking the driver
* small fix to pass the AID to the FW
* use FW PS decisions with multi-queue
ath9k
* add device tree bindings
* switch to use mac80211 intermediate software queues to reduce
latency and fix bufferbloat
wl18xx
* allow scanning in AP mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Thu, 24 Nov 2016 05:04:08 +0000 (07:04 +0200)]
netdevice: fix sparse warning for HARD_TX_LOCK
sparse warns about context imbalance in any code
that uses HARD_TX_LOCK/UNLOCK - this is because it's
unable to determine that flags don't change so
lock and unlock are paired.
Seems easy enough to fix by adding __acquire/__release
calls.
With this patch af_packet.c is now sparse-clean,
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ulrik De Bie [Wed, 23 Nov 2016 20:11:04 +0000 (21:11 +0100)]
ptp: gianfar: Use high resolution frequency method.
This patch depends on commit
d8d263541913 ("ptp: Introduce a high
resolution frequency adjustment method.")
The gianfar devices offer a frequency resolution of about 0.46 ppb
(depends on actual value of tmr_add, for the calculation assumed
0x80000000). This patch lets users of the device benefit from the increased
frequency resolution when tuning the clock. Thanks to the rounding the
maximum error between the requested frequency and the applied frequency
will then be about 0.23 ppb.
Tested on a v3.3.8 kernel on a real gianfar device. Verified compilation
on net-next (currently at v4.9-rc5).
Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 23 Nov 2016 17:46:52 +0000 (09:46 -0800)]
mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()
Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.
Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.
Interrupt moderation is best effort, and we do not really care of
ultra precise counters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 27 Nov 2016 04:42:21 +0000 (23:42 -0500)]
Merge git://git./linux/kernel/git/davem/net
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.
Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 27 Nov 2016 01:21:13 +0000 (17:21 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs splice fix from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix default_file_splice_read()
Al Viro [Sun, 27 Nov 2016 01:05:42 +0000 (20:05 -0500)]
fix default_file_splice_read()
Botched calculation of number of pages. As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 26 Nov 2016 23:28:34 +0000 (15:28 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here is a revert and two bugfixes for the I2C designware driver.
Please note that we are still hunting down a regression for the
i2c-octeon driver. While there is a fix pending, we have unclear
feedback from the testers currently. An rc8 would be quite helpful
for this case"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: designware: do not disable adapter after transfer"
i2c: designware: fix rx fifo depth tracking
i2c: designware: report short transfers
Linus Torvalds [Sat, 26 Nov 2016 23:26:20 +0000 (15:26 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
"This resolves the ksyms issues by reverting the commit which
introduced the breakage"
There was what I consider to be a better fix, but it's late in the rc
game, so I'll take the revert.
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
Revert "arm: move exports to definitions"
Linus Torvalds [Sat, 26 Nov 2016 21:05:05 +0000 (13:05 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix leak in fsl/fman driver, from Dan Carpenter.
2) Call flow dissector initcall earlier than any networking driver can
register and start to use it, from Eric Dumazet.
3) Some dup header fixes from Geliang Tang.
4) TIPC link monitoring compat fix from Jon Paul Maloy.
5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
Florian Fainelli.
6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman
Mashak.
7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
tipc: resolve connection flow control compatibility problem
mvpp2: use correct size for memset
net/mlx5: drop duplicate header delay.h
net: ieee802154: drop duplicate header delay.h
ibmvnic: drop duplicate header seq_file.h
fsl/fman: fix a leak in tgec_free()
net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
tipc: improve sanity check for received domain records
tipc: fix compatibility bug in link monitoring
net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
dwc_eth_qos: drop duplicate headers
net sched filters: fix filter handle ID in tfilter_notify_chain()
net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
bnxt: do not busy-poll when link is down
udplite: call proper backlog handlers
ipv6: bump genid when the IFA_F_TENTATIVE flag is clear
net/mlx4_en: Free netdev resources under state lock
net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"
rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()
bnxt_en: Fix a VXLAN vs GENEVE issue
...
Linus Torvalds [Sat, 26 Nov 2016 20:24:47 +0000 (12:24 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- Fix a crash that occurs at driver initialization if the memory region
is already busy (request_mem_region() fails).
- Fix a vma validation check that mistakenly allows a private device-
dax mapping to be established. Device-dax explicitly forbids private
mappings so it can guarantee a given fault granularity and backing
memory type.
Both of these fixes have soaked in -next and are tagged for -stable.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: fail all private mapping attempts
device-dax: check devm_nsio_enable() return value
Linus Torvalds [Sat, 26 Nov 2016 20:18:59 +0000 (12:18 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"Four fixes for bugs found by syzkaller on x86, all for stable"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: check for pic and ioapic presence before use
KVM: x86: fix out-of-bounds accesses of rtc_eoi map
KVM: x86: drop error recovery in em_jmp_far and em_ret_far
KVM: x86: fix out-of-bounds access in lapic
Linus Torvalds [Sat, 26 Nov 2016 19:24:03 +0000 (11:24 -0800)]
Merge tag 'powerpc-4.9-6' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Set missing wakeup bit in LPCR on POWER9
- Fix the early OPAL console wrappers
- Fixup kernel read only mapping
Fixes for code merged this cycle:
- Fix missing CRCs, add more asm-prototypes.h declarations"
* tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fixup kernel read only mapping
powerpc/boot: Fix the early OPAL console wrappers
powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
powerpc: Set missing wakeup bit in LPCR on POWER9
Jon Paul Maloy [Thu, 24 Nov 2016 23:47:07 +0000 (18:47 -0500)]
tipc: resolve connection flow control compatibility problem
In commit
10724cc7bb78 ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.
However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.
We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 26 Nov 2016 02:22:28 +0000 (21:22 -0500)]
Merge branch 'mlxsw-trap-groups-and-policers'
Jiri Pirko says:
====================
mlxsw: traps, trap groups and policers
Nogah says:
For a packet to be sent from the HW to the cpu, it needs to be trapped.
For a trap to be activate it should be assigned to a trap group.
Those trap groups can have policers, to limit the packet rate (the max
number of packets that can be sent to the cpu in a time slot, the rest
will be discarded) or the data rate (the same, but the count is not by the
number of packets but by their total length in bytes).
This patchset rearrange the trap setting API, re-write the traps and the
trap groups list in spectrum and assign them policers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:47 +0000 (10:33 +0100)]
mlxsw: spectrum: Add policers for trap groups
Configure policers and connect them to trap groups.
While many trap groups share policer's configuration they don't share
the actual policer because each trap group represents a different
flow / protocol and we don't want one of them to be able to exceed its
rate on behalf of another.
For example, if STP and LLDP gets to send 128 packets/sec each, if we
put them in one 256 packets/sec policer, one can send 200 packets while
the other only 50.
Note that IP2ME covers lots of flows, so it's limit is set to match the
cpu ability to handle data.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:46 +0000 (10:33 +0100)]
mlxsw: reg: Add QoS Policer Configuration Register
The QPCR register is used to create and control policers.
A policer can discard or change the color of packets that are
trapped by a specific trap.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:45 +0000 (10:33 +0100)]
mlxsw: resources: Add max cpu policers resource
Add a new resource to resources query: max cpu policers which tells us how
many policers can be used to limit the data rate to the cpu port.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:44 +0000 (10:33 +0100)]
mlxsw: Create a different trap group list for each device
Trap groups can be used to control traps priority, both in terms of
which trap "wins" if a packet matches two traps (priority) and in terms
of packets from which trap group will be scheduled to the cpu first (tc).
They can also be used to set rate limiters (policers) on them (will be
added in the next patches).
Currently, we support two trap groups. In Spectrum we want a better
resolution, so every protocol / flow will have a different trap group,
so we can control its parameters separately. Once the policers will be
implemented, it will also allow us limit the rate of each protocol by
itself.
This patch change the trap group list to include:
* the emad trap group, which is shared for all the devices.
* Switchx2's trap groups, which are a copy of the current trap groups.
* Spectrum's new trap groups, in order to match the above guidelines.
(Switchib is using only the emad trap group, so it require no changes).
This patch also includes new configuration for Spectrum's trap groups,
with primary priority order within them.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:43 +0000 (10:33 +0100)]
mlxsw: spectrum: Add BGP trap
Add a trap for BGP protocol that was previously trapped by the generic trap
for IP2ME. This trap will allow us to have better control (over priority
and rate) of the traffic.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:42 +0000 (10:33 +0100)]
mlxsw: Change trap groups setting
Trap groups have many options which we currently set to default values.
In the next patches we will use many of them with non-default values.
Some of these options have no default value, so this patch sets them as
params for the trap group set function. Others almost always use the same
values, so the set function will use this default values. In the rare cases
when they will need to be with other values, these values can be set
directly (using the macros for fields in registers).
Parameters without default value:
TC - the traffic class for packets that hit this trap group.
(old default is the max tc)
priority - if one packet hits multiple trap groups, the group with the
higher priority will "catch" it. (old default is 0)
policer - limit rate policer (old default is disabled)
Default parameters:
swid - switch id, relevant for the emad trap only, ignored on Spectrum.
(new default is 0)
rdq - CPU receive descriptor queue (new default is identical to trap
group id)
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:41 +0000 (10:33 +0100)]
mlxsw: resources: Add max trap groups resource
Add the max number of trap groups to resource query.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:40 +0000 (10:33 +0100)]
mlxsw: core: Change emad trap group settings
Currently, the emad trap init was done in the core. In the future we will
want to add some changes to the traps groups, according to device type.
This commit create a driver function to create the trap group for the
emad, so later it can be changed by devices. It also changes the emad
registration to use the new generic functions.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:39 +0000 (10:33 +0100)]
mlxsw: Add option to choose trap group
Currently, we set the trap group to pre-determined option, based on whether
it is an rx or event trap.
This commit adds a possibility to chose the trap group, so it can be set
to different values in the following patches.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:38 +0000 (10:33 +0100)]
mlxsw: Change trap set function
Change trap setting function so instead of determining the trap group by
trap id, it gets it as a parameter (so later we can have different trap
groups for Spectrum and Switchx2).
Add "is_ctrl" parameter to the trap setting function. It control whether
the trapped packets wait in a designated control buffer or in their
default one. This parameter is ignored by Switchx2 and Switchib.
Add these parameters to the traps array in Spectrum, Switchx2 and
Switchib.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:37 +0000 (10:33 +0100)]
mlxsw: switchib: Use generic listener struct for events
Change the event handling in Switchib to be comptible with Spectrum and
Switchx2.
Use the generic listener struct for the events. Init and fini them by loop
(and not by calling each event by its name).
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:36 +0000 (10:33 +0100)]
mlxsw: switchx2: Use generic listener struct for events
Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:35 +0000 (10:33 +0100)]
mlxsw: spectrum: Use generic listener struct for events
Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:34 +0000 (10:33 +0100)]
mlxsw: core: Introduce generic macro for event
Create a macro for creating the generic listener struct for events,
similar to the one for rx traps.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:33 +0000 (10:33 +0100)]
mlxsw: switchx2: Use generic listener struct for rx traps
Reorganize the traps to use the new generic listener struct and
functions. Use macros to shorten the traps list.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:32 +0000 (10:33 +0100)]
mlxsw: spectrum: Use generic listener struct for rx traps
Replace the old rx listener struct definitions by the generic ones.
Use the new generic registering / unregistering functions for them.
Add some macros to organize the trap list.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:31 +0000 (10:33 +0100)]
mlxsw: core: Expose generic macros for rx trap
In Spectrum, there is a macro to arrange the traps list.
This macro is useful for everyone who is using rx traps.
Create a similar macro in core.h for creating the generic listener struct
for rx traps.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:30 +0000 (10:33 +0100)]
mlxsw: core: Create a generic function to register / unregister traps
We have 2 types of HW traps to handle, rx traps and events.
The registration workflow for both is very similar. So it only make
sense to create one function to handle both.
This patch creates a struct to hold the data for both cases. It also
creates a registration and an un-registration functions that get this
generic struct as input.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:29 +0000 (10:33 +0100)]
mlxsw: spectrum: Remove unused traps
Since commit
99724c18fc66 ("mlxsw: spectrum: Introduce support for
router interfaces") we no longer rely on flooding traffic to the CPU in
order to trap packets intended for the host itself. Therefore, the FDB
MC trap can be removed.
Remove traps for protocols that are not supported yet.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 24 Nov 2016 16:28:12 +0000 (17:28 +0100)]
mvpp2: use correct size for memset
gcc-7 detects a short memset in mvpp2, introduced in the original
merge of the driver:
drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_cls_init':
drivers/net/ethernet/marvell/mvpp2.c:3296:2: error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]
The result seems to be that we write uninitialized data into the
flow table registers, although we did not get any warning about
that uninitialized data usage.
Using sizeof() lets us initialize then entire array instead.
Fixes:
3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 24 Nov 2016 13:58:33 +0000 (21:58 +0800)]
net/mlx5: drop duplicate header delay.h
Drop duplicate header delay.h from mlx5/core/main.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 24 Nov 2016 13:58:32 +0000 (21:58 +0800)]
net: ieee802154: drop duplicate header delay.h
Drop duplicate header delay.h from adf7242.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 24 Nov 2016 13:58:29 +0000 (21:58 +0800)]
ibmvnic: drop duplicate header seq_file.h
Drop duplicate header seq_file.h from ibmvnic.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>