David S. Miller [Thu, 15 Aug 2019 19:04:24 +0000 (12:04 -0700)]
Merge branch 'rds-next'
Gerd Rausch says:
====================
net/rds: Fixes from internal Oracle repo
This is the first set of (mostly old) patches from our internal repository
in an effort to synchronize what Oracle had been using internally
with what is shipped with the Linux kernel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 13 Jan 2011 19:40:31 +0000 (11:40 -0800)]
rds: check for excessive looping in rds_send_xmit
Original commit from 2011 updated to include a change by
Yuval Shaia <yuval.shaia@oracle.com>
that adds a new statistic counter "send_stuck_rm"
to capture the messages looping exessively
in the send path.
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerd Rausch [Thu, 11 Jul 2019 19:15:50 +0000 (12:15 -0700)]
net/rds: Add a few missing rds_stat_names entries
In a previous commit, fields were added to "struct rds_statistics"
but array "rds_stat_names" was not updated accordingly.
Please note the inconsistent naming of the string representations
that is done in the name of compatibility
with the Oracle internal code-base.
s_recv_bytes_added_to_socket -> "recv_bytes_added_to_sock"
s_recv_bytes_removed_from_socket -> "recv_bytes_freed_fromsock"
Fixes:
192a798f5299 ("RDS: add stat for socket recv memory usage")
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Mason [Fri, 3 Feb 2012 16:08:51 +0000 (11:08 -0500)]
RDS: don't use GFP_ATOMIC for sk_alloc in rds_create
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Mason [Fri, 3 Feb 2012 16:07:54 +0000 (11:07 -0500)]
RDS: limit the number of times we loop in rds_send_xmit
This will kick the RDS worker thread if we have been looping
too long.
Original commit from 2012 updated to include a change by
Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
that triggers "must_wake" if "rds_ib_recv_refill_one" fails.
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Aug 2019 19:02:44 +0000 (12:02 -0700)]
Merge branch 'netdevsim-implement-support-for-devlink-region-and-snapshots'
Jiri Pirko says:
====================
netdevsim: implement support for devlink region and snapshots
Implement devlink region support for netdevsim and test it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 15 Aug 2019 13:46:34 +0000 (15:46 +0200)]
selftests: netdevsim: add devlink regions tests
Test netdevsim devlink region implementation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 15 Aug 2019 13:46:33 +0000 (15:46 +0200)]
netdevsim: implement support for devlink region and snapshots
Implement dummy region of size 32K and allow user to create snapshots
or random data using debugfs file trigger.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Aug 2019 19:01:22 +0000 (12:01 -0700)]
Merge branch 'selftests-netdevsim-add-devlink-paramstests'
Jiri Pirko says:
====================
selftests: netdevsim: add devlink paramstests
The first patch is just a helper addition as a dependency of the actual
test in patch number two.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 15 Aug 2019 13:42:29 +0000 (15:42 +0200)]
selftests: netdevsim: add devlink params tests
Test recently added netdevsim devlink param implementation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 15 Aug 2019 13:42:28 +0000 (15:42 +0200)]
selftests: net: push jq workaround into separate helper
Push the jq return value workaround code into a separate helper so it
could be used by the rest of the code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Thu, 15 Aug 2019 12:41:00 +0000 (20:41 +0800)]
page_pool: remove unnecessary variable init
Remove variable initializations in functions that
are followed by assignments before use
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 15 Aug 2019 12:21:30 +0000 (14:21 +0200)]
r8169: sync EEE handling for RTL8168h with vendor driver
Sync EEE init for RTL8168h with vendor driver and add two writes to
vendor-specific registers.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Aug 2019 18:48:30 +0000 (11:48 -0700)]
Merge branch 'realtek-EEE'
Heiner Kallweit says:
====================
net: phy: realtek: map vendor-specific EEE registers to standard MMD registers
EEE-related registers on newer integrated PHY's have the standard
layout, but are accessible not via MMD but via vendor-specific
registers. Emulating the standard MMD registers allows to use the
generic functions for EEE control and to significantly simplify
the r8169 driver.
v2:
- rebase patch 2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 15 Aug 2019 12:14:18 +0000 (14:14 +0200)]
r8169: use the generic EEE management functions
Now that the Realtek PHY driver maps the vendor-specific EEE registers
to the standard MMD registers, we can remove all special handling and
use the generic functions phy_ethtool_get/set_eee.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 15 Aug 2019 12:12:55 +0000 (14:12 +0200)]
net: phy: realtek: add support for EEE registers on integrated PHY's
EEE-related registers on newer integrated PHY's have the standard
layout, but are accessible not via MMD but via vendor-specific
registers. Emulating the standard MMD registers allows to use the
generic functions for EEE control.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 15 Aug 2019 11:19:22 +0000 (13:19 +0200)]
net: phy: swphy: emulate register MII_ESTATUS
When the genphy driver binds to a swphy it will call
genphy_read_abilites that will try to read MII_ESTATUS if BMSR_ESTATEN
is set in MII_BMSR. So far this would read the default value 0xffff
and 1000FD and 1000HD are reported as supported just by chance.
Better add explicit support for emulating MII_ESTATUS.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 15 Aug 2019 11:15:19 +0000 (13:15 +0200)]
net: phy: read MII_CTRL1000 in genphy_read_status only if needed
Value of MII_CTRL1000 is needed only if LPA_1000MSFAIL is set.
Therefore move reading this register.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ka-Cheong Poon [Thu, 15 Aug 2019 09:36:43 +0000 (02:36 -0700)]
net/rds: Add RDS6_INFO_SOCKETS and RDS6_INFO_RECV_MESSAGES options
Add support of the socket options RDS6_INFO_SOCKETS and
RDS6_INFO_RECV_MESSAGES which update the RDS_INFO_SOCKETS and
RDS_INFO_RECV_MESSAGES options respectively. The old options work
for IPv4 sockets only.
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Gleixner [Tue, 13 Aug 2019 08:00:25 +0000 (10:00 +0200)]
net/mvpp2: Replace tasklet with softirq hrtimer
The tx_done_tasklet tasklet is used in invoke the hrtimer
(mvpp2_hr_timer_cb) in softirq context. This can be also achieved without
the tasklet but with HRTIMER_MODE_SOFT as hrtimer mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Aug 2019 02:59:00 +0000 (19:59 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next.
This round addresses fallout from previous pull request:
1) Remove #warning from ipt_LOG.h and ip6t_LOG.h headers,
from Jeremy Sowden.
2) Incorrect parens in memcmp() in nft_bitwise, from Nathan Chancellor.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Chancellor [Wed, 14 Aug 2019 16:58:09 +0000 (09:58 -0700)]
netfilter: nft_bitwise: Adjust parentheses to fix memcmp size argument
clang warns:
net/netfilter/nft_bitwise.c:138:50: error: size argument in 'memcmp'
call is a comparison [-Werror,-Wmemsize-comparison]
if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
~~~~~~~~~~~~~~~~~~^~
net/netfilter/nft_bitwise.c:138:6: note: did you mean to compare the
result of 'memcmp' instead?
if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
^
)
net/netfilter/nft_bitwise.c:138:32: note: explicitly cast the argument
to size_t to silence this warning
if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
^
(size_t)(
1 error generated.
Adjust the parentheses so that the result of the sizeof is used for the
size argument in memcmp, rather than the result of the comparison (which
would always be true because sizeof is a non-zero number).
Fixes:
bd8699e9e292 ("netfilter: nft_bitwise: add offload support")
Link: https://github.com/ClangBuiltLinux/linux/issues/638
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 14 Aug 2019 08:01:28 +0000 (09:01 +0100)]
netfilter: remove deprecation warnings from uapi headers.
There are two netfilter userspace headers which contain deprecation
warnings. While these headers are not used within the kernel, they are
compiled stand-alone for header-testing.
Pablo informs me that userspace iptables still refer to these headers,
and the intention was to use xt_LOG.h instead and remove these, but
userspace was never updated.
Remove the warnings.
Fixes:
2a475c409fe8 ("kbuild: remove all netfilter headers from header-test blacklist.")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Heiner Kallweit [Tue, 13 Aug 2019 06:09:32 +0000 (08:09 +0200)]
net: phy: realtek: add NBase-T PHY auto-detection
Realtek provided information on how the new NIC-integrated PHY's
expose whether they support 2.5G/5G/10G. This allows to automatically
differentiate 1Gbps and 2.5Gbps PHY's, and therefore allows to
remove the fake PHY ID mechanism for RTL8125.
So far RTL8125 supports 2.5Gbps only, but register layout for faster
modes has been defined already, so let's use this information to be
future-proof.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 12 Aug 2019 18:47:40 +0000 (20:47 +0200)]
r8169: fix sporadic transmit timeout issue
Holger reported sporadic transmit timeouts and it turned out that one
path misses ringing the doorbell. Fix was suggested by Eric.
Fixes:
ef14358546b1 ("r8169: make use of xmit_more")
Suggested-by: Eric Dumazet <edumazet@google.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 14 Aug 2019 01:22:57 +0000 (18:22 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.
2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.
3) More strict validation of IPVS sysctl values, from Junwei Hu.
4) Remove unnecessary spaces after on the right hand side of assignments,
from yangxingwu.
5) Add offload support for bitwise operation.
6) Extend the nft_offload_reg structure to store immediate date.
7) Collapse several ip_set header files into ip_set.h, from
Jeremy Sowden.
8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
from Jeremy Sowden.
9) Fix several sparse warnings due to missing prototypes, from
Valdis Kletnieks.
10) Use static lock initialiser to ensure connlabel spinlock is
initialized on boot time to fix sched/act_ct.c, patch
from Florian Westphal.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Jakub Kicinski [Wed, 14 Aug 2019 01:12:45 +0000 (18:12 -0700)]
Merge branch 'r8152-RX-improve'
Hayes says:
====================
v2:
For patch #2, replace list_for_each_safe with list_for_each_entry_safe.
Remove unlikely in WARN_ON. Adjust the coding style.
For patch #4, replace list_for_each_safe with list_for_each_entry_safe.
Remove "else" after "continue".
For patch #5. replace sysfs with ethtool to modify rx_copybreak and
rx_pending.
v1:
The different chips use different rx buffer size.
Use skb_add_rx_frag() to reduce memory copy for RX.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Hayes Wang [Tue, 13 Aug 2019 03:42:09 +0000 (11:42 +0800)]
r8152: change rx_copybreak and rx_pending through ethtool
Let the rx_copybreak and rx_pending could be modified by
ethtool.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Hayes Wang [Tue, 13 Aug 2019 03:42:08 +0000 (11:42 +0800)]
r8152: support skb_add_rx_frag
Use skb_add_rx_frag() to reduce the memory copy for rx data.
Use a new list of rx_used to store the rx buffer which couldn't be
reused yet.
Besides, the total number of rx buffer may be increased or decreased
dynamically. And it is limited by RTL8152_MAX_RX_AGG.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Hayes Wang [Tue, 13 Aug 2019 03:42:07 +0000 (11:42 +0800)]
r8152: use alloc_pages for rx buffer
Replace kmalloc_node() with alloc_pages() for rx buffer.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Hayes Wang [Tue, 13 Aug 2019 03:42:06 +0000 (11:42 +0800)]
r8152: replace array with linking list for rx information
The original method uses an array to store the rx information. The
new one uses a list to link each rx structure. Then, it is possible
to increase/decrease the number of rx structure dynamically.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Hayes Wang [Tue, 13 Aug 2019 03:42:05 +0000 (11:42 +0800)]
r8152: separate the rx buffer size
The different chips may accept different rx buffer sizes. The RTL8152
supports 16K bytes, and RTL8153 support 32K bytes.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Jakub Kicinski [Wed, 14 Aug 2019 00:16:11 +0000 (17:16 -0700)]
Merge branch 'net-phy-let-phy_speed_down-up-support-speeds-1Gbps'
Heiner says:
====================
So far phy_speed_down/up can be used up to 1Gbps only. Remove this
restriction and add needed helpers to phy-core.c
v2:
- remove unused parameter in patch 1
- rename __phy_speed_down to phy_speed_down_core in patch 2
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Heiner Kallweit [Mon, 12 Aug 2019 21:52:19 +0000 (23:52 +0200)]
net: phy: let phy_speed_down/up support speeds >1Gbps
So far phy_speed_down/up can be used up to 1Gbps only. Remove this
restriction by using new helper __phy_speed_down. New member adv_old
in struct phy_device is used by phy_speed_up to restore the advertised
modes before calling phy_speed_down. Don't simply advertise what is
supported because a user may have intentionally removed modes from
advertisement.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Heiner Kallweit [Mon, 12 Aug 2019 21:51:27 +0000 (23:51 +0200)]
net: phy: add phy_speed_down_core and phy_resolve_min_speed
phy_speed_down_core provides most of the functionality for
phy_speed_down. It makes use of new helper phy_resolve_min_speed that is
based on the sorting of the settings[] array. In certain cases it may be
helpful to be able to exclude legacy half duplex modes, therefore
prepare phy_resolve_min_speed() for it.
v2:
- rename __phy_speed_down to phy_speed_down_core
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Heiner Kallweit [Mon, 12 Aug 2019 21:50:30 +0000 (23:50 +0200)]
net: phy: add __set_linkmode_max_speed
We will need the functionality of __set_linkmode_max_speed also for
linkmode bitmaps other than phydev->supported. Therefore split it.
v2:
- remove unused parameter from __set_linkmode_max_speed
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Vlad Buslov [Mon, 12 Aug 2019 17:02:02 +0000 (20:02 +0300)]
net: devlink: remove redundant rtnl lock assert
It is enough for caller of devlink_compat_switch_id_get() to hold the net
device to guarantee that devlink port is not destroyed concurrently. Remove
rtnl lock assertion and modify comment to warn user that they must hold
either rtnl lock or reference to net device. This is necessary to
accommodate future implementation of rtnl-unlocked TC offloads driver
callbacks.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Jakub Kicinski [Tue, 13 Aug 2019 23:24:57 +0000 (16:24 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
as well):
for (i = 1; i <= btf__get_nr_types(btf); i++) {
t = (struct btf_type *)btf__type_by_id(btf, i);
if (!has_datasec && btf_is_var(t)) {
/* replace VAR with INT */
t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
<<<<<<< HEAD
/*
* using size = 1 is the safest choice, 4 will be too
* big and cause kernel BTF validation failure if
* original variable took less than 4 bytes
*/
t->size = 1;
*(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
} else if (!has_datasec && kind == BTF_KIND_DATASEC) {
=======
t->size = sizeof(int);
*(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
} else if (!has_datasec && btf_is_datasec(t)) {
>>>>>>>
72ef80b5ee131e96172f19e74b4f98fa3404efe8
/* replace DATASEC with STRUCT */
Conflict is between the two commits
1d4126c4e119 ("libbpf: sanitize VAR to
conservative 1-byte INT") and
b03bc6853c0e ("libbpf: convert libbpf code to
use new btf helpers"), so we need to pick the sanitation fixup as well as
use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
the following:
[...]
if (!has_datasec && btf_is_var(t)) {
/* replace VAR with INT */
t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
/*
* using size = 1 is the safest choice, 4 will be too
* big and cause kernel BTF validation failure if
* original variable took less than 4 bytes
*/
t->size = 1;
*(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
} else if (!has_datasec && btf_is_datasec(t)) {
/* replace DATASEC with STRUCT */
[...]
The main changes are:
1) Addition of core parts of compile once - run everywhere (co-re) effort,
that is, relocation of fields offsets in libbpf as well as exposure of
kernel's own BTF via sysfs and loading through libbpf, from Andrii.
More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
and http://vger.kernel.org/lpc-bpf2018.html#session-2
2) Enable passing input flags to the BPF flow dissector to customize parsing
and allowing it to stop early similar to the C based one, from Stanislav.
3) Add a BPF helper function that allows generating SYN cookies from XDP and
tc BPF, from Petar.
4) Add devmap hash-based map type for more flexibility in device lookup for
redirects, from Toke.
5) Improvements to XDP forwarding sample code now utilizing recently enabled
devmap lookups, from Jesper.
6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
and Takshak.
7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.
8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.
9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.
10) Add perf event output helper also for other skb-based program types, from Allan.
11) Fix a co-re related compilation error in selftests, from Yonghong.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
YueHaibing [Mon, 12 Aug 2019 14:41:56 +0000 (22:41 +0800)]
net: hns3: Make hclge_func_reset_sync_vf static
Fix sparse warning:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:3190:5:
warning: symbol 'hclge_func_reset_sync_vf' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Jiri Pirko [Mon, 12 Aug 2019 12:28:31 +0000 (14:28 +0200)]
devlink: send notifications for deleted snapshots on region destroy
Currently the notifications for deleted snapshots are sent only in case
user deletes a snapshot manually. Send the notifications in case region
is destroyed too.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Daniel Borkmann [Tue, 13 Aug 2019 21:19:42 +0000 (23:19 +0200)]
Merge branch 'bpf-libbpf-read-sysfs-btf'
Andrii Nakryiko says:
====================
Now that kernel's BTF is exposed through sysfs at well-known location, attempt
to load it first as a target BTF for the purpose of BPF CO-RE relocations.
Patch #1 is a follow-up patch to rename /sys/kernel/btf/kernel into
/sys/kernel/btf/vmlinux.
Patch #2 adds ability to load raw BTF contents from sysfs and expands the list
of locations libbpf attempts to load vmlinux BTF from.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Andrii Nakryiko [Tue, 13 Aug 2019 18:54:43 +0000 (11:54 -0700)]
libbpf: attempt to load kernel BTF from sysfs first
Add support for loading kernel BTF from sysfs (/sys/kernel/btf/vmlinux)
as a target BTF. Also extend the list of on disk search paths for
vmlinux ELF image with entries that perf is searching for.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Andrii Nakryiko [Tue, 13 Aug 2019 18:54:42 +0000 (11:54 -0700)]
btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux
Expose kernel's BTF under the name vmlinux to be more uniform with using
kernel module names as file names in the future.
Fixes:
341dfcf8d78e ("btf: expose BTF info through sysfs")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Petar Penkov [Mon, 12 Aug 2019 23:30:39 +0000 (16:30 -0700)]
selftests/bpf: fix race in flow dissector tests
Since the "last_dissection" map holds only the flow keys for the most
recent packet, there is a small race in the skb-less flow dissector
tests if a new packet comes between transmitting the test packet, and
reading its keys from the map. If this happens, the test packet keys
will be overwritten and the test will fail.
Changing the "last_dissection" map to a hash map, keyed on the
source/dest port pair resolves this issue. Additionally, let's clear the
last test results from the map between tests to prevent previous test
cases from interfering with the following test cases.
Fixes:
0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode")
Signed-off-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Peter Wu [Tue, 13 Aug 2019 00:38:33 +0000 (01:38 +0100)]
tools: bpftool: add feature check for zlib
bpftool requires libelf, and zlib for decompressing /proc/config.gz.
zlib is a transitive dependency via libelf, and became mandatory since
elfutils 0.165 (Jan 2016). The feature check of libelf is already done
in the elfdep target of tools/lib/bpf/Makefile, pulled in by bpftool via
a dependency on libbpf.a. Add a similar feature check for zlib.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Andrii Nakryiko [Mon, 12 Aug 2019 18:39:47 +0000 (11:39 -0700)]
btf: expose BTF info through sysfs
Make .BTF section allocated and expose its contents through sysfs.
/sys/kernel/btf directory is created to contain all the BTFs present
inside kernel. Currently there is only kernel's main BTF, represented as
/sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
each module will expose its BTF as /sys/kernel/btf/<module-name> file.
Current approach relies on a few pieces coming together:
1. pahole is used to take almost final vmlinux image (modulo .BTF and
kallsyms) and generate .BTF section by converting DWARF info into
BTF. This section is not allocated and not mapped to any segment,
though, so is not yet accessible from inside kernel at runtime.
2. objcopy dumps .BTF contents into binary file and subsequently
convert binary file into linkable object file with automatically
generated symbols _binary__btf_kernel_bin_start and
_binary__btf_kernel_bin_end, pointing to start and end, respectively,
of BTF raw data.
3. final vmlinux image is generated by linking this object file (and
kallsyms, if necessary). sysfs_btf.c then creates
/sys/kernel/btf/kernel file and exposes embedded BTF contents through
it. This allows, e.g., libbpf and bpftool access BTF info at
well-known location, without resorting to searching for vmlinux image
on disk (location of which is not standardized and vmlinux image
might not be even available in some scenarios, e.g., inside qemu
during testing).
Alternative approach using .incbin assembler directive to embed BTF
contents directly was attempted but didn't work, because sysfs_proc.o is
not re-compiled during link-vmlinux.sh stage. This is required, though,
to update embedded BTF data (initially empty data is embedded, then
pahole generates BTF info and we need to regenerate sysfs_btf.o with
updated contents, but it's too late at that point).
If BTF couldn't be generated due to missing or too old pahole,
sysfs_btf.c handles that gracefully by detecting that
_binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
/sys/kernel/btf at all.
v2->v3:
- added Documentation/ABI/testing/sysfs-kernel-btf (Greg K-H);
- created proper kobject (btf_kobj) for btf directory (Greg K-H);
- undo v2 change of reusing vmlinux, as it causes extra kallsyms pass
due to initially missing __binary__btf_kernel_bin_{start/end} symbols;
v1->v2:
- allow kallsyms stage to re-use vmlinux generated by gen_btf();
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Florian Westphal [Mon, 12 Aug 2019 11:40:04 +0000 (13:40 +0200)]
netfilter: connlabels: prefer static lock initialiser
seen during boot:
BUG: spinlock bad magic on CPU#2, swapper/0/1
lock: nf_connlabels_lock+0x0/0x60, .magic:
00000000, .owner: <none>/-1, .owner_cpu: 0
Call Trace:
do_raw_spin_lock+0x14e/0x1b0
nf_connlabels_get+0x15/0x40
ct_init_net+0xc4/0x270
ops_init+0x56/0x1c0
register_pernet_operations+0x1c8/0x350
register_pernet_subsys+0x1f/0x40
tcf_register_action+0x7c/0x1a0
do_one_initcall+0x13d/0x2d9
Problem is that ct action init function can run before
connlabels_init(). Lock has not been initialised yet.
Fix it by using a static initialiser.
Fixes:
b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Valdis Klētnieks [Thu, 8 Aug 2019 05:43:22 +0000 (01:43 -0400)]
netfilter: nf_nat_proto: make tables static
Sparse warns about two tables not being declared.
CHECK net/netfilter/nf_nat_proto.c
net/netfilter/nf_nat_proto.c:725:26: warning: symbol 'nf_nat_ipv4_ops' was not declared. Should it be static?
net/netfilter/nf_nat_proto.c:964:26: warning: symbol 'nf_nat_ipv6_ops' was not declared. Should it be static?
And in fact they can indeed be static.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Valdis Klētnieks [Thu, 8 Aug 2019 05:28:08 +0000 (01:28 -0400)]
netfilter: nf_tables: add missing prototypes.
Sparse rightly complains about undeclared symbols.
CHECK net/netfilter/nft_set_hash.c
net/netfilter/nft_set_hash.c:647:21: warning: symbol 'nft_set_rhash_type' was not declared. Should it be static?
net/netfilter/nft_set_hash.c:670:21: warning: symbol 'nft_set_hash_type' was not declared. Should it be static?
net/netfilter/nft_set_hash.c:690:21: warning: symbol 'nft_set_hash_fast_type' was not declared. Should it be static?
CHECK net/netfilter/nft_set_bitmap.c
net/netfilter/nft_set_bitmap.c:296:21: warning: symbol 'nft_set_bitmap_type' was not declared. Should it be static?
CHECK net/netfilter/nft_set_rbtree.c
net/netfilter/nft_set_rbtree.c:470:21: warning: symbol 'nft_set_rbtree_type' was not declared. Should it be static?
Include nf_tables_core.h rather than nf_tables.h to pick up the additional definitions.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:17:05 +0000 (15:17 +0100)]
kbuild: remove all netfilter headers from header-test blacklist.
All the blacklisted NF headers can now be compiled stand-alone, so
removed them from the blacklist.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:17:04 +0000 (15:17 +0100)]
netfilter: remove "#ifdef __KERNEL__" guards from some headers.
A number of non-UAPI Netfilter header-files contained superfluous
"#ifdef __KERNEL__" guards. Removed them.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:17:03 +0000 (15:17 +0100)]
netfilter: add missing IS_ENABLED(CONFIG_NETFILTER) checks to some header-files.
linux/netfilter.h defines a number of struct and inline function
definitions which are only available is CONFIG_NETFILTER is enabled.
These structs and functions are used in declarations and definitions in
other header-files. Added preprocessor checks to make sure these
headers will compile if CONFIG_NETFILTER is disabled.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:17:02 +0000 (15:17 +0100)]
netfilter: add missing IS_ENABLED(CONFIG_NF_CONNTRACK) checks to some header-files.
struct nf_conn contains a "struct nf_conntrack ct_general" member and
struct net contains a "struct netns_ct ct" member which are both only
defined in CONFIG_NF_CONNTRACK is enabled. These members are used in a
number of inline functions defined in other header-files. Added
preprocessor checks to make sure the headers will compile if
CONFIG_NF_CONNTRACK is disabled.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:17:01 +0000 (15:17 +0100)]
netfilter: add missing IS_ENABLED(CONFIG_NF_TABLES) check to header-file.
nf_tables.h defines an API comprising several inline functions and
macros that depend on the nft member of struct net. However, this is
only defined is CONFIG_NF_TABLES is enabled. Added preprocessor checks
to ensure that nf_tables.h will compile if CONFIG_NF_TABLES is disabled.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:17:00 +0000 (15:17 +0100)]
netfilter: add missing IS_ENABLED(CONFIG_BRIDGE_NETFILTER) checks to header-file.
br_netfilter.h defines inline functions that use an enum constant and
struct member that are only defined if CONFIG_BRIDGE_NETFILTER is
enabled. Added preprocessor checks to ensure br_netfilter.h will
compile if CONFIG_BRIDGE_NETFILTER is disabled.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:16:59 +0000 (15:16 +0100)]
netfilter: add missing includes to a number of header-files.
A number of netfilter header-files used declarations and definitions
from other headers without including them. Added include directives to
make those declarations and definitions available.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:16:58 +0000 (15:16 +0100)]
netfilter: inline four headers files into another one.
linux/netfilter/ipset/ip_set.h included four other header files:
include/linux/netfilter/ipset/ip_set_comment.h
include/linux/netfilter/ipset/ip_set_counter.h
include/linux/netfilter/ipset/ip_set_skbinfo.h
include/linux/netfilter/ipset/ip_set_timeout.h
Of these the first three were not included anywhere else. The last,
ip_set_timeout.h, was included in a couple of other places, but defined
inline functions which call other inline functions defined in ip_set.h,
so ip_set.h had to be included before it.
Inlined all four into ip_set.h, and updated the other files that
included ip_set_timeout.h.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Thu, 1 Aug 2019 12:09:26 +0000 (14:09 +0200)]
netfilter: nf_tables: store data in offload context registers
Store immediate data into offload context register. This allows follow
up instructions to take it from the corresponding source register.
This patch is required to support for payload mangling, although other
instructions that take data from source register will benefit from this
too.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Tue, 30 Jul 2019 11:32:01 +0000 (13:32 +0200)]
netfilter: nft_bitwise: add offload support
Extract mask from bitwise operation and store it into the corresponding
context register so the cmp instruction can set the mask accordingly.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
yangxingwu [Tue, 16 Jul 2019 02:13:01 +0000 (10:13 +0800)]
netfilter: remove unnecessary spaces
This patch removes extra spaces.
Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Peter Wu [Fri, 9 Aug 2019 00:39:11 +0000 (01:39 +0100)]
tools: bpftool: fix reading from /proc/config.gz
/proc/config has never existed as far as I can see, but /proc/config.gz
is present on Arch Linux. Add support for decompressing config.gz using
zlib which is a mandatory dependency of libelf anyway. Replace existing
stdio functions with gzFile operations since the latter transparently
handles uncompressed and gzip-compressed files.
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:42:43 +0000 (12:42 +0200)]
caif: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Richard Fontana <rfontana@redhat.com>
Cc: Steve Winslow <swinslow@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:31:08 +0000 (12:31 +0200)]
xen-netback: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: xen-devel@lists.xenproject.org
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 12 Aug 2019 04:27:15 +0000 (21:27 -0700)]
Merge branch 'net-dsa-mv88e6xxx-prepare-Wait-Bit-operation'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: prepare Wait Bit operation
The Remote Management Interface has its own implementation of a Wait
Bit operation, which requires a bit number and a value to wait for.
In order to prepare the introduction of this implementation, rework the
code waiting for bits and masks in mv88e6xxx to match this signature.
This has the benefit to unify the implementation of wait routines while
removing obsolete wait and update functions and also reducing the code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 9 Aug 2019 22:47:59 +0000 (18:47 -0400)]
net: dsa: mv88e6xxx: add delay in direct SMI wait
The mv88e6xxx_smi_direct_wait routine is used to wait on indirect
registers access. It is of no exception and must delay between read
attempts, like other wait routines.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 9 Aug 2019 22:47:58 +0000 (18:47 -0400)]
net: dsa: mv88e6xxx: fix SMI bit checking
The current mv88e6xxx_smi_direct_wait function is only used to check
the 16th bit of the (16-bit) SMI Command register. But the bit shift
operation is not enough if we eventually use this function to check
other bits, thus replace it with a mask.
Fixes:
e7ba0fad9c53 ("net: dsa: mv88e6xxx: refine SMI support")
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 9 Aug 2019 22:47:57 +0000 (18:47 -0400)]
net: dsa: mv88e6xxx: remove wait and update routines
Now that we have proper Wait Bit and Wait Mask routines, remove the
unused mv88e6xxx_wait routine and its Global 1 and Global 2 variants.
The indirect tables such as the Device Mapping Table or Priority
Override Table make use of an Update bit to distinguish reading (0)
from writing (1) operations. After a write operation occurs, the bit
self clears right away so there's no need to wait on it. Thus keep
things simple and remove the mv88e6xxx_update helper as well.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 9 Aug 2019 22:47:56 +0000 (18:47 -0400)]
net: dsa: mv88e6xxx: wait for AVB Busy bit
The AVB is not an indirect table using an Update bit, but a unit using
a Busy bit. This means that we must ensure that this bit is cleared
before setting it and wait until it gets cleared again after writing
an operation. Reflect that.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 9 Aug 2019 22:47:55 +0000 (18:47 -0400)]
net: dsa: mv88e6xxx: introduce wait bit routine
Many portions of the driver need to wait until a given bit is set
or cleared. Some busses even have a specific implementation for this
operation. In preparation for such variant, implement a generic Wait
Bit routine that can be used by the driver core functions.
This allows us to get rid of the custom implementations we may find
in the driver. Note that for the EEPROM bits, BUSY and RUNNING bits
are independent, thus it is more efficient to wait independently for
each bit instead of waiting for their mask.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 9 Aug 2019 22:47:54 +0000 (18:47 -0400)]
net: dsa: mv88e6xxx: introduce wait mask routine
The current mv88e6xxx_wait routine is used to wait for a given mask
to be cleared to zero. However in some cases, the driver may have
to wait for a given mask to be of a certain non-zero value.
Thus provide a generic wait mask routine that will be used to implement
the current mv88e6xxx_wait function, and use it to wait for 88E6185
PPU states.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 9 Aug 2019 22:47:53 +0000 (18:47 -0400)]
net: dsa: mv88e6xxx: wait for 88E6185 PPU disabled
The PPU state of 88E6185 can be either "Disabled at Reset" or
"Disabled after Initialization". Because we intentionally clear the
PPU Enabled bit before checking its state, it is safe to wait for the
MV88E6185_G1_STS_PPU_STATE_DISABLED state explicitly instead of waiting
for any state different than MV88E6185_G1_STS_PPU_STATE_POLLING.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 9 Aug 2019 20:59:07 +0000 (22:59 +0200)]
r8169: inline rtl8169_free_rx_databuff
rtl8169_free_rx_databuff is used in only one place, so let's inline it.
We can improve the loop because rtl8169_init_ring zero's RX_databuff
before calling rtl8169_rx_fill, and rtl8169_rx_fill fills
Rx_databuff starting from index 0.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 12 Aug 2019 04:24:32 +0000 (21:24 -0700)]
Merge branch 'realtek-phy-next'
Heiner Kallweit says:
====================
net: phy: realtek: add support for integrated 2.5Gbps PHY in RTL8125
This series adds support for the integrated 2.5Gbps PHY in RTL8125.
First three patches add necessary functionality to phylib.
Changes in v2:
- added patch 1
- changed patch 4 to use a fake PHY ID that is injected by the
network driver. This allows to use a dedicated PHY driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 9 Aug 2019 18:45:14 +0000 (20:45 +0200)]
net: phy: realtek: add support for the 2.5Gbps PHY in RTL8125
This adds support for the integrated 2.5Gbps PHY in Realtek RTL8125.
Advertisement of 2.5Gbps mode is done via a vendor-specific register.
Same applies to reading NBase-T link partner advertisement.
Unfortunately this 2.5Gbps PHY shares the PHY ID with the integrated
1Gbps PHY's in other Realtek network chips and so far no method is
known to differentiate them. As a workaround use a dedicated fake PHY ID
that is set by the network driver by intercepting the MDIO PHY ID read.
v2:
- Create dedicated PHY driver and use a fake PHY ID that is injected by
the network driver. Suggested by Andrew Lunn.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 9 Aug 2019 18:44:22 +0000 (20:44 +0200)]
net: phy: add phy_modify_paged_changed
Add helper function phy_modify_paged_changed, behavios is the same
as for phy_modify_changed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 9 Aug 2019 18:43:50 +0000 (20:43 +0200)]
net: phy: prepare phylib to deal with PHY's extending Clause 22
The integrated PHY in 2.5Gbps chip RTL8125 is the first (known to me)
PHY that uses standard Clause 22 for all modes up to 1Gbps and adds
2.5Gbps control using vendor-specific registers. To use phylib for
the standard part little extensions are needed:
- Move most of genphy_config_aneg to a new function
__genphy_config_aneg that takes a parameter whether restarting
auto-negotiation is needed (depending on whether content of
vendor-specific advertisement register changed).
- Don't clear phydev->lp_advertising in genphy_read_status so that
we can set non-C22 mode flags before.
Basically both changes mimic the behavior of the equivalent Clause 45
functions.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 9 Aug 2019 18:43:04 +0000 (20:43 +0200)]
net: phy: simplify genphy_config_advert by using the linkmode_adv_to_xxx_t functions
Using linkmode_adv_to_mii_adv_t and linkmode_adv_to_mii_ctrl1000_t
allows to simplify the code. In addition avoiding the conversion to
the legacy u32 advertisement format allows to remove the warning.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 9 Aug 2019 11:05:12 +0000 (13:05 +0200)]
netdevsim: register couple of devlink params
Register couple of devlink params, one generic, one driver-specific.
Make the values available over debugfs.
Example:
$ echo "111" > /sys/bus/netdevsim/new_device
$ devlink dev param
netdevsim/netdevsim111:
name max_macs type generic
values:
cmode driverinit value 32
name test1 type driver-specific
values:
cmode driverinit value true
$ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs
32
$ cat /sys/kernel/debug/netdevsim/netdevsim111/test1
Y
$ devlink dev param set netdevsim/netdevsim111 name max_macs cmode driverinit value 16
$ devlink dev param set netdevsim/netdevsim111 name test1 cmode driverinit value false
$ devlink dev reload netdevsim/netdevsim111
$ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs
16
$ cat /sys/kernel/debug/netdevsim/netdevsim111/test1
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 11 Aug 2019 17:53:31 +0000 (10:53 -0700)]
Merge branch 'drop_monitor-Capture-dropped-packets-and-metadata'
Ido Schimmel says:
====================
drop_monitor: Capture dropped packets and metadata
So far drop monitor supported only one mode of operation in which a
summary of recent packet drops is periodically sent to user space as a
netlink event. The event only includes the drop location (program
counter) and number of drops in the last interval.
While this mode of operation allows one to understand if the system is
dropping packets, it is not sufficient if a more detailed analysis is
required. Both the packet itself and related metadata are missing.
This patchset extends drop monitor with another mode of operation where
the packet - potentially truncated - and metadata (e.g., drop location,
timestamp, netdev) are sent to user space as a netlink event. Thanks to
the extensible nature of netlink, more metadata can be added in the
future.
To avoid performing expensive operations in the context in which
kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU
skb drop list. The list is then processed in process context (using a
workqueue), where the netlink messages are allocated, prepared and
finally sent to user space.
A follow-up patchset will integrate drop monitor with devlink and allow
the latter to call into drop monitor to report hardware drops. In the
future, XDP drops can be added as well, thereby making drop monitor the
go-to netlink channel for diagnosing all packet drops.
Example usage with patched dropwatch [1] can be found here [2]. Example
dissection of drop monitor netlink events with patched wireshark [3] can
be found here [4]. I will submit both changes upstream after the kernel
changes are accepted. Another change worth making is adding a dropmon
pseudo interface to libpcap, similar to the nflog interface [5]. This
will allow users to specifically listen on dropmon traffic instead of
capturing all netlink packets via the nlmon netdev.
Patches #1-#5 prepare the code towards the actual changes in later
patches.
Patch #6 adds another mode of operation to drop monitor in which the
dropped packet itself is notified to user space along with metadata.
Patch #7 allows users to truncate reported packets to a specific length,
in case only the headers are of interest. The original length of the
packet is added as metadata to the netlink notification.
Patch #8 allows user to query the current configuration of drop monitor
(e.g., alert mode, truncation length).
Patches #9-#10 allow users to tune the length of the per-CPU skb drop
list according to their needs.
Changes since v1 [6]:
* Add skb protocol as metadata. This allows user space to correctly
dissect the packet instead of blindly assuming it is an Ethernet
packet
Changes since RFC [7]:
* Limit the length of the per-CPU skb drop list and make it configurable
* Do not use the hysteresis timer in packet alert mode
* Introduce alert mode operations in a separate patch and only then
introduce the new alert mode
* Use 'skb->skb_iif' instead of 'skb->dev' because the latter is inside
a union with 'dev_scratch' and therefore not guaranteed to point to a
valid netdev
* Return '-EBUSY' instead of '-EOPNOTSUPP' when trying to configure drop
monitor while it is monitoring
* Did not change schedule_work() in favor of schedule_work_on() as I did
not observe a change in number of tail drops
[1] https://github.com/idosch/dropwatch/tree/packet-mode
[2] https://gist.github.com/idosch/
3d524b887e16bc11b4b19e25c23dcc23#file-gistfile1-txt
[3] https://github.com/idosch/wireshark/tree/drop-monitor-v2
[4] https://gist.github.com/idosch/
3d524b887e16bc11b4b19e25c23dcc23#file-gistfile2-txt
[5] https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-netfilter-linux.c
[6] https://patchwork.ozlabs.org/cover/1143443/
[7] https://patchwork.ozlabs.org/cover/1135226/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:55 +0000 (10:35 +0300)]
drop_monitor: Expose tail drop counter
Previous patch made the length of the per-CPU skb drop list
configurable. Expose a counter that shows how many packets could not be
enqueued to this list.
This allows users determine the desired queue length.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:54 +0000 (10:35 +0300)]
drop_monitor: Make drop queue length configurable
In packet alert mode, each CPU holds a list of dropped skbs that need to
be processed in process context and sent to user space. To avoid
exhausting the system's memory the maximum length of this queue is
currently set to 1000.
Allow users to tune the length of this queue according to their needs.
The configured length is reported to user space when drop monitor
configuration is queried.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:53 +0000 (10:35 +0300)]
drop_monitor: Add a command to query current configuration
Users should be able to query the current configuration of drop monitor
before they start using it. Add a command to query the existing
configuration which currently consists of alert mode and packet
truncation length.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:52 +0000 (10:35 +0300)]
drop_monitor: Allow truncation of dropped packets
When sending dropped packets to user space it is not always necessary to
copy the entire packet as usually only the headers are of interest.
Allow user to specify the truncation length and add the original length
of the packet as additional metadata to the netlink message.
By default no truncation is performed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:51 +0000 (10:35 +0300)]
drop_monitor: Add packet alert mode
So far drop monitor supported only one alert mode in which a summary of
locations in which packets were recently dropped was sent to user space.
This alert mode is sufficient in order to understand that packets were
dropped, but lacks information to perform a more detailed analysis.
Add a new alert mode in which the dropped packet itself is passed to
user space along with metadata: The drop location (as program counter
and resolved symbol), ingress netdevice and drop timestamp. More
metadata can be added in the future.
To avoid performing expensive operations in the context in which
kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and
queued on per-CPU skb drop list. Then, in process context the netlink
message is allocated, prepared and finally sent to user space.
The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting
the system's memory. Subsequent patches will make this limit
configurable and also add a counter that indicates how many skbs were
tail dropped.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:50 +0000 (10:35 +0300)]
drop_monitor: Add alert mode operations
The next patch is going to add another alert mode in which the dropped
packet is notified to user space, instead of only a summary of recent
drops.
Abstract the differences between the modes by adding alert mode
operations. The operations are selected based on the currently
configured mode and associated with the probes and the work item just
before tracing starts.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:49 +0000 (10:35 +0300)]
drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration
Currently, the configure command does not do anything but return an
error. Subsequent patches will enable the command to change various
configuration options such as alert mode and packet truncation.
Similar to other netlink-based configuration channels, make sure only
users with the CAP_NET_ADMIN capability set can execute this command.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:48 +0000 (10:35 +0300)]
drop_monitor: Reset per-CPU data before starting to trace
The function reset_per_cpu_data() allocates and prepares a new skb for
the summary netlink alert message ('NET_DM_CMD_ALERT'). The new skb is
stored in the per-CPU 'data' variable and the old is returned.
The function is invoked during module initialization and from the
workqueue, before an alert is sent. This means that it is possible to
receive an alert with stale data, if we stopped tracing when the
hysteresis timer ('data->send_timer') was pending.
Instead of invoking the function during module initialization, invoke it
just before we start tracing and ensure we get a fresh skb.
This also allows us to remove the calls to initialize the timer and the
work item from the module initialization path, since both could have
been triggered by the error paths of reset_per_cpu_data().
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:47 +0000 (10:35 +0300)]
drop_monitor: Initialize timer and work item upon tracing enable
The timer and work item are currently initialized once during module
init, but subsequent patches will need to associate different functions
with the work item, based on the configured alert mode.
Allow subsequent patches to make that change by initializing and
de-initializing these objects during tracing enable and disable.
This also guarantees that once the request to disable tracing returns,
no more netlink notifications will be generated.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 11 Aug 2019 07:35:46 +0000 (10:35 +0300)]
drop_monitor: Split tracing enable / disable to different functions
Subsequent patches will need to enable / disable tracing based on the
configured alerting mode.
Reduce the nesting level and prepare for the introduction of this
functionality by splitting the tracing enable / disable operations into
two different functions.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 10 Aug 2019 22:25:49 +0000 (15:25 -0700)]
Merge branch 'Networking-driver-debugfs-cleanups'
Greg Kroah-Hartman says:
====================
Networking driver debugfs cleanups
There is no need to test the result of any debugfs call anymore. The
debugfs core warns the user if something fails, and the return value of
a debugfs call can always be fed back into another debugfs call with no
problems.
Also, debugfs is for debugging, so if there are problems with debugfs
(i.e. the system is out of memory) the rest of the kernel should not
change behavior, so testing for debugfs calls is pointless and not the
goal of debugfs at all.
This series cleans up a lot of networking drivers and some wimax code
that was calling debugfs and trying to do something with the return
value that it didn't need to. Removing this logic makes the code
smaller, easier to understand, and use less run-time memory in some
cases, all good things.
The series is against net-next, and have no dependancies between any of
them if they want to go through any random tree/order. Or, if wanted,
I can take them through my driver-core tree where other debugfs cleanups
are being slowly fed during major merge windows.
v3: fix build warning in i2400m, I thought I had caught them all :(
add acks from some reviewers
v2: fix up build warnings, it's as if I never even built these. Ugh, so
sorry for wasting people's time with the v1 series. I need to stop
relying on 0-day as it isn't working well anymore :(
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:32 +0000 (12:17 +0200)]
ieee802154: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Harry Morris <h.morris@cascoda.com>
Cc: linux-wpan@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:31 +0000 (12:17 +0200)]
ixgbe: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:30 +0000 (12:17 +0200)]
i40e: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:29 +0000 (12:17 +0200)]
fm10k: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:28 +0000 (12:17 +0200)]
mvpp2: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Chevallier <maxime.chevallier@bootlin.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Huckleberry <nhuck@google.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:27 +0000 (12:17 +0200)]
skge: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:26 +0000 (12:17 +0200)]
qca: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Michael Heimpold <michael.heimpold@i2se.com>
Cc: Yangtao Li <tiny.windzz@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:25 +0000 (12:17 +0200)]
dpaa2: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Because we don't care about the individual files, we can remove the
stored dentry for the files, as they are not needed to be kept track of
at all.
Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:24 +0000 (12:17 +0200)]
stmmac: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Because we don't care about the individual files, we can remove the
stored dentry for the files, as they are not needed to be kept track of
at all.
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sat, 10 Aug 2019 10:17:23 +0000 (12:17 +0200)]
nfp: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Edwin Peer <edwin.peer@netronome.com>
Cc: Yangtao Li <tiny.windzz@gmail.com>
Cc: Simon Horman <simon.horman@netronome.com>
Cc: oss-drivers@netronome.com
Cc: netdev@vger.kernel.org
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>