platform/kernel/linux-starfive.git
2 years agoMerge tag 'mlx5-fixes-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 1 Oct 2021 10:19:40 +0000 (11:19 +0100)]
Merge tag 'mlx5-fixes-2021-09-30' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-09-30

This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: Use struct_size() helper in kvmalloc()
Gustavo A. R. Silva [Wed, 29 Sep 2021 20:17:18 +0000 (15:17 -0500)]
net: sched: Use struct_size() helper in kvmalloc()

Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows
that, in the worst scenario, could lead to heap overflows.

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20210929201718.GA342296@embeddedor
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 30 Sep 2021 21:49:21 +0000 (14:49 -0700)]
Merge git://git./linux/kernel/git/netdev/net

drivers/net/phy/bcm7xxx.c
  d88fd1b546ff ("net: phy: bcm7xxx: Fixed indirect MMD operations")
  f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165")

net/sched/sch_api.c
  b193e15ac69d ("net: prevent user from passing illegal stab size")
  69508d43334e ("net_sched: Use struct_size() and flex_array_size() helpers")

Both cases trivial - adjacent code additions.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 30 Sep 2021 21:28:05 +0000 (14:28 -0700)]
Merge tag 'net-5.15-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from mac80211, netfilter and bpf.

  Current release - regressions:

   - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
     interrupt

   - mdio: revert mechanical patches which broke handling of optional
     resources

   - dev_addr_list: prevent address duplication

  Previous releases - regressions:

   - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
     (NULL deref)

   - Revert "mac80211: do not use low data rates for data frames with no
     ack flag", fixing broadcast transmissions

   - mac80211: fix use-after-free in CCMP/GCMP RX

   - netfilter: include zone id in tuple hash again, minimize collisions

   - netfilter: nf_tables: unlink table before deleting it (race -> UAF)

   - netfilter: log: work around missing softdep backend module

   - mptcp: don't return sockets in foreign netns

   - sched: flower: protect fl_walk() with rcu (race -> UAF)

   - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup

   - smsc95xx: fix stalled rx after link change

   - enetc: fix the incorrect clearing of IF_MODE bits

   - ipv4: fix rtnexthop len when RTA_FLOW is present

   - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this
     SKU

   - e100: fix length calculation & buffer overrun in ethtool::get_regs

  Previous releases - always broken:

   - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx

   - mac80211: drop frames from invalid MAC address in ad-hoc mode

   - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race
     -> UAF)

   - bpf, x86: Fix bpf mapping of atomic fetch implementation

   - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog

   - netfilter: ip6_tables: zero-initialize fragment offset

   - mhi: fix error path in mhi_net_newlink

   - af_unix: return errno instead of NULL in unix_create1() when over
     the fs.file-max limit

  Misc:

   - bpf: exempt CAP_BPF from checks against bpf_jit_limit

   - netfilter: conntrack: make max chain length random, prevent
     guessing buckets by attackers

   - netfilter: nf_nat_masquerade: make async masq_inet6_event handling
     generic, defer conntrack walk to work queue (prevent hogging RTNL
     lock)"

* tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
  af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
  net: stmmac: fix EEE init issue when paired with EEE capable PHYs
  net: dev_addr_list: handle first address in __hw_addr_add_ex
  net: sched: flower: protect fl_walk() with rcu
  net: introduce and use lock_sock_fast_nested()
  net: phy: bcm7xxx: Fixed indirect MMD operations
  net: hns3: disable firmware compatible features when uninstall PF
  net: hns3: fix always enable rx vlan filter problem after selftest
  net: hns3: PF enable promisc for VF when mac table is overflow
  net: hns3: fix show wrong state when add existing uc mac address
  net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
  net: hns3: don't rollback when destroy mqprio fail
  net: hns3: remove tc enable checking
  net: hns3: do not allow call hns3_nic_net_open repeatedly
  ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
  net: bridge: mcast: Associate the seqcount with its protecting lock.
  net: mdio-ipq4019: Fix the error for an optional regs resource
  net: hns3: fix hclge_dbg_dump_tm_pg() stack usage
  net: mdio: mscc-miim: Fix the mdio controller
  af_unix: Return errno instead of NULL in unix_create1().
  ...

2 years agonet/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode
Aya Levin [Mon, 13 Sep 2021 13:49:47 +0000 (16:49 +0300)]
net/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode

TX-port-TS hijacks the PTP traffic to a specific HW TX-queue. This
conflicts with MQPRIO in channel mode, which specifies explicitly which
TC accepts the packet. This patch mutually excludes the above
configuration.

Fixes: ec60c4581bd9 ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Fix the presented RQ index in PTP stats
Lama Kayal [Sun, 29 Aug 2021 08:26:03 +0000 (11:26 +0300)]
net/mlx5e: Fix the presented RQ index in PTP stats

PTP-RQ counters title format contains PTP-RQ identifier, which is
mistakenly not passed to sprinft().
This leads to unexpected garbage values instead.
This patch fixes it.

Before applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq15_packets: 0
     ptp_rq8_bytes: 0
     ptp_rq6_csum_complete: 0
     ptp_rq14_csum_complete_tail: 0
     ptp_rq3_csum_complete_tail_slow : 0
     ptp_rq9_csum_unnecessary: 0
     ptp_rq1_csum_unnecessary_inner: 0
     ptp_rq7_csum_none: 0
     ptp_rq10_xdp_drop: 0
     ptp_rq9_xdp_redirect: 0
     ptp_rq13_lro_packets: 0
     ptp_rq12_lro_bytes: 0
     ptp_rq10_ecn_mark: 0
     ptp_rq9_removed_vlan_packets: 0
     ptp_rq5_wqe_err: 0
     ptp_rq8_mpwqe_filler_cqes: 0
     ptp_rq2_mpwqe_filler_strides: 0
     ptp_rq5_oversize_pkts_sw_drop: 0
     ptp_rq6_buff_alloc_err: 0
     ptp_rq15_cqe_compress_blks: 0
     ptp_rq2_cqe_compress_pkts: 0
     ptp_rq2_cache_reuse: 0
     ptp_rq12_cache_full: 0
     ptp_rq11_cache_empty: 256
     ptp_rq12_cache_busy: 0
     ptp_rq11_cache_waive: 0
     ptp_rq12_congst_umr: 0
     ptp_rq11_arfs_err: 0
     ptp_rq9_recover: 0

After applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq0_packets: 0
     ptp_rq0_bytes: 0
     ptp_rq0_csum_complete: 0
     ptp_rq0_csum_complete_tail: 0
     ptp_rq0_csum_complete_tail_slow : 0
     ptp_rq0_csum_unnecessary: 0
     ptp_rq0_csum_unnecessary_inner: 0
     ptp_rq0_csum_none: 0
     ptp_rq0_xdp_drop: 0
     ptp_rq0_xdp_redirect: 0
     ptp_rq0_lro_packets: 0
     ptp_rq0_lro_bytes: 0
     ptp_rq0_ecn_mark: 0
     ptp_rq0_removed_vlan_packets: 0
     ptp_rq0_wqe_err: 0
     ptp_rq0_mpwqe_filler_cqes: 0
     ptp_rq0_mpwqe_filler_strides: 0
     ptp_rq0_oversize_pkts_sw_drop: 0
     ptp_rq0_buff_alloc_err: 0
     ptp_rq0_cqe_compress_blks: 0
     ptp_rq0_cqe_compress_pkts: 0
     ptp_rq0_cache_reuse: 0
     ptp_rq0_cache_full: 0
     ptp_rq0_cache_empty: 256
     ptp_rq0_cache_busy: 0
     ptp_rq0_cache_waive: 0
     ptp_rq0_congst_umr: 0
     ptp_rq0_arfs_err: 0
     ptp_rq0_recover: 0

Fixes: a28359e922c6 ("net/mlx5e: Add PTP-RX statistics")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Fix setting number of EQs of SFs
Shay Drory [Tue, 14 Sep 2021 07:13:02 +0000 (10:13 +0300)]
net/mlx5: Fix setting number of EQs of SFs

When setting number of completion EQs of the SF, consider number of
online CPUs.
Without this consideration, when number of online cpus are less than 8,
unnecessary 8 completion EQs are allocated.

Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Fix length of irq_index in chars
Shay Drory [Thu, 19 Aug 2021 13:01:28 +0000 (16:01 +0300)]
net/mlx5: Fix length of irq_index in chars

The maximum irq_index can be 2047, This means irq_name should have 4
characters reserve for the irq_index. Hence, increase it to 4.

Fixes: 3af26495a247 ("net/mlx5: Enlarge interrupt field in CREATE_EQ")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Avoid generating event after PPS out in Real time mode
Aya Levin [Thu, 23 Sep 2021 12:30:01 +0000 (15:30 +0300)]
net/mlx5: Avoid generating event after PPS out in Real time mode

When in Real-time mode, HW clock is synced with the PTP daemon. Hence
driver should not re-calibrate the next pulse (via MTPPSE repetitive
events mechanism).

This patch arms repetitive events only in free-running mode.

Fixes: 432119de33d9 ("net/mlx5: Add cyc2time HW translation mode support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Force round second at 1PPS out start time
Aya Levin [Thu, 23 Sep 2021 13:56:09 +0000 (16:56 +0300)]
net/mlx5: Force round second at 1PPS out start time

Allow configuration of 1PPS start time only with time-stamp representing
a round second. Prior to this patch driver allowed setting of a
non-round-second which is not supported by the device. Avoid unexpected
behavior by restricting start-time configuration to a round-second.

Fixes: 4272f9b88db9 ("net/mlx5e: Change 1PPS out scheme")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: E-Switch, Fix double allocation of acl flow counter
Moshe Shemesh [Thu, 23 Sep 2021 14:57:47 +0000 (17:57 +0300)]
net/mlx5: E-Switch, Fix double allocation of acl flow counter

Flow counter is allocated in eswitch legacy acl setting functions
without checking if already allocated by previous setting. Add a check
to avoid such double allocation.

Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: ea651a86d468 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Improve MQPRIO resiliency
Tariq Toukan [Wed, 29 Sep 2021 12:51:26 +0000 (15:51 +0300)]
net/mlx5e: Improve MQPRIO resiliency

* Add netdev->tc_to_txq rollback in case of failure in
  mlx5e_update_netdev_queues().
* Fix broken transition between the two modes:
  MQPRIO DCB mode with tc==8, and MQPRIO channel mode.
* Disable MQPRIO channel mode if re-attaching with a different number
  of channels.
* Improve code sharing.

Fixes: ec60c4581bd9 ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Keep the value for maximum number of channels in-sync
Tariq Toukan [Thu, 2 Sep 2021 07:33:32 +0000 (10:33 +0300)]
net/mlx5e: Keep the value for maximum number of channels in-sync

The value for maximum number of channels is first calculated based
on the netdev's profile and current function resources (specifically,
number of MSIX vectors, which depends among other things on the number
of online cores in the system).
This value is then used to calculate the netdev's number of rxqs/txqs.
Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
is constant and we must not exceed it.

To achieve this, keep the maximum number of channels in sync upon any
netdevice re-attach.

Use mlx5e_get_max_num_channels() for calculating the number of netdev's
rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
coinsiders core device resources, profile, and netdev) to init or
update priv->max_nch.

Before this patch, the value of priv->max_nch might get out of sync,
mistakenly allowing accesses to out-of-bounds objects, which would
crash the system.

Track the number of channels stats structures used in a separate
field, as they are persistent to suspend/resume operations. All the
collected stats of every channel index that ever existed should be
preserved. They are reset only when struct mlx5e_priv is,
in mlx5e_priv_cleanup(), which is part of the profile changing flow.

There is no point anymore in blocking a profile change due to max_nch
mismatch in mlx5e_netdev_change_profile(). Remove the limitation.

Fixes: a1f240f18017 ("net/mlx5e: Adjust to max number of channles when re-attaching")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: IPSEC RX, enable checksum complete
Raed Salem [Thu, 26 Aug 2021 14:07:17 +0000 (17:07 +0300)]
net/mlx5e: IPSEC RX, enable checksum complete

Currently in Rx data path IPsec crypto offloaded packets uses
csum_none flag, so checksum is handled by the stack, this naturally
have some performance/cpu utilization impact on such flows. As Nvidia
NIC starting from ConnectX6DX provides checksum complete value out of
the box also for such flows there is no sense in taking csum_none path,
furthermore the stack (xfrm) have the method to handle checksum complete
corrections for such flows i.e. IPsec trailer removal and consequently
checksum value adjustment.

Because of the above and in addition the ConnectX6DX is the first HW
which supports IPsec crypto offload then it is safe to report csum
complete for IPsec offloaded traffic.

Fixes: b2ac7541e377 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agoMerge tag 'gpio-fixes-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 30 Sep 2021 19:11:35 +0000 (12:11 -0700)]
Merge tag 'gpio-fixes-for-v5.15-rc4' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "A single fix for the gpio-pca953x driver and two commits updating the
  MAINTAINERS entries for Mun Yew Tham (GPIO specific) and myself
  (treewide after a change in professional situation).

  Summary:

   - don't ignore I2C errors in gpio-pca953x

   - update MAINTAINERS entries for Mun Yew Tham and myself"

* tag 'gpio-fixes-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer
  MAINTAINERS: update my email address
  gpio: pca953x: do not ignore i2c errors

2 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Thu, 30 Sep 2021 19:00:46 +0000 (12:00 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Not much too exciting here, although two syzkaller bugs that seem to
  have 9 lives may have finally been squashed.

  Several core bugs and a batch of driver bug fixes:

   - Fix compilation problems in qib and hfi1

   - Do not corrupt the joined multicast group state when using
     SEND_ONLY

   - Several CMA bugs, a reference leak for listening and two syzkaller
     crashers

   - Various bug fixes for irdma

   - Fix a Sleeping while atomic bug in usnic

   - Properly sanitize kernel pointers in dmesg

   - Two bugs in the 64b CQE support for hns"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/hns: Add the check of the CQE size of the user space
  RDMA/hns: Fix the size setting error when copying CQE in clean_cq()
  RDMA/hfi1: Fix kernel pointer leak
  RDMA/usnic: Lock VF with mutex instead of spinlock
  RDMA/hns: Work around broken constant propagation in gcc 8
  RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests
  RDMA/cma: Do not change route.addr.src_addr.ss_family
  RDMA/irdma: Report correct WC error when there are MW bind errors
  RDMA/irdma: Report correct WC error when transport retry counter is exceeded
  RDMA/irdma: Validate number of CQ entries on create CQ
  RDMA/irdma: Skip CQP ring during a reset
  MAINTAINERS: Update Broadcom RDMA maintainers
  RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failure
  IB/cma: Do not send IGMP leaves for sendonly Multicast groups
  IB/qib: Fix clang confusion of NULL pointer comparison

2 years agoaf_unix: fix races in sk_peer_pid and sk_peer_cred accesses
Eric Dumazet [Wed, 29 Sep 2021 22:57:50 +0000 (15:57 -0700)]
af_unix: fix races in sk_peer_pid and sk_peer_cred accesses

Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.

In order to fix this issue, this patch adds a new spinlock that needs
to be used whenever these fields are read or written.

Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
reading sk->sk_peer_pid which makes no sense, as this field
is only possibly set by AF_UNIX sockets.
We will have to clean this in a separate patch.
This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
or implementing what was truly expected.

Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'snmp-optimizations'
David S. Miller [Thu, 30 Sep 2021 13:17:10 +0000 (14:17 +0100)]
Merge branch 'snmp-optimizations'

Eric Dumazet says:

====================
net: snmp: minor optimizations

Fetching many SNMP counters on hosts with large number of cpus
takes a lot of time. mptcp still uses the old non-batched
fashion which is not cache friendly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomptcp: use batch snmp operations in mptcp_seq_show()
Eric Dumazet [Thu, 30 Sep 2021 01:03:33 +0000 (18:03 -0700)]
mptcp: use batch snmp operations in mptcp_seq_show()

Using snmp_get_cpu_field_batch() allows for better cpu cache
utilization, especially on hosts with large number of cpus.

Also remove special handling when mptcp mibs where not yet
allocated.

I chose to use temporary storage on the stack to keep this patch simple.
We might in the future use the storage allocated in netstat_seq_show().

Combined with prior patch (inlining snmp_get_cpu_field)
time to fetch and output mptcp counters on a 256 cpu host [1]
goes from 75 usec to 16 usec.

[1] L1 cache size is 32KB, it is not big enough to hold all dataset.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: snmp: inline snmp_get_cpu_field()
Eric Dumazet [Thu, 30 Sep 2021 01:03:32 +0000 (18:03 -0700)]
net: snmp: inline snmp_get_cpu_field()

This trivial function is called ~90,000 times on 256 cpus hosts,
when reading /proc/net/netstat. And this number keeps inflating.

Inlining it saves many cycles.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/mlx4_en: Add XDP_REDIRECT statistics
Joshua Roys [Thu, 30 Sep 2021 02:30:23 +0000 (22:30 -0400)]
net/mlx4_en: Add XDP_REDIRECT statistics

Add counters for XDP REDIRECT success and failure. This brings the
redirect path in line with metrics gathered via the other XDP paths.

Signed-off-by: Joshua Roys <roysjosh@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: stmmac: fix EEE init issue when paired with EEE capable PHYs
Wong Vee Khee [Thu, 30 Sep 2021 06:44:36 +0000 (14:44 +0800)]
net: stmmac: fix EEE init issue when paired with EEE capable PHYs

When STMMAC is paired with Energy-Efficient Ethernet(EEE) capable PHY,
and the PHY is advertising EEE by default, we need to enable EEE on the
xPCS side too, instead of having user to manually trigger the enabling
config via ethtool.

Fixed this by adding xpcs_config_eee() call in stmmac_eee_init().

Fixes: 7617af3d1a5e ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet")
Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoixgbe: let the xdpdrv work with more than 64 cpus
Jason Xing [Wed, 29 Sep 2021 17:56:05 +0000 (10:56 -0700)]
ixgbe: let the xdpdrv work with more than 64 cpus

Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the
server is equipped with more than 64 cpus online. So it turns out that
the loading of xdpdrv causes the "NOMEM" failure.

Actually, we can adjust the algorithm and then make it work through
mapping the current cpu to some xdp ring with the protect of @tx_lock.

Here are some numbers before/after applying this patch with xdp-example
loaded on the eth0X:

As client (tx path):
                     Before    After
TCP_STREAM send-64   734.14    714.20
TCP_STREAM send-128  1401.91   1395.05
TCP_STREAM send-512  5311.67   5292.84
TCP_STREAM send-1k   9277.40   9356.22 (not stable)
TCP_RR     send-1    22559.75  21844.22
TCP_RR     send-128  23169.54  22725.13
TCP_RR     send-512  21670.91  21412.56

As server (rx path):
                     Before    After
TCP_STREAM send-64   1416.49   1383.12
TCP_STREAM send-128  3141.49   3055.50
TCP_STREAM send-512  9488.73   9487.44
TCP_STREAM send-1k   9491.17   9356.22 (not stable)
TCP_RR     send-1    23617.74  23601.60
...

Notice: the TCP_RR mode is unstable as the official document explains.

I tested many times with different parameters combined through netperf.
Though the result is not that accurate, I cannot see much influence on
this patch. The static key is places on the hot path, but it actually
shouldn't cause a huge regression theoretically.

Co-developed-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'SO_RESEVED_MEM'
David S. Miller [Thu, 30 Sep 2021 12:36:46 +0000 (13:36 +0100)]
Merge branch 'SO_RESEVED_MEM'

Wei Wang says:

====================
net: add new socket option SO_RESERVE_MEM

This patch series introduces a new socket option SO_RESERVE_MEM.
This socket option provides a mechanism for users to reserve a certain
amount of memory for the socket to use. When this option is set, kernel
charges the user specified amount of memory to memcg, as well as
sk_forward_alloc. This amount of memory is not reclaimable and is
available in sk_forward_alloc for this socket.
With this socket option set, the networking stack spends less cycles
doing forward alloc and reclaim, which should lead to better system
performance, with the cost of an amount of pre-allocated and
unreclaimable memory, even under memory pressure.
With a tcp_stream test with 10 flows running on a simulated 100ms RTT
link, I can see the cycles spent in __sk_mem_raise_allocated() dropping
by ~0.02%. Not a whole lot, since we already have logic in
sk_mem_uncharge() to only reclaim 1MB when sk_forward_alloc has more
than 2MB free space. But on a system suffering memory pressure
constently, the savings should be more.

The first patch is the implementation of this socket option. The
following 2 patches change the tcp stack to make use of this reserved
memory when under memory pressure. This makes the tcp stack behavior
more flexible when under memory pressure, and provides a way for user to
control the distribution of the memory among its sockets.
With a TCP connection on a simulated 100ms RTT link, the default
throughput under memory pressure is ~500Kbps. With SO_RESERVE_MEM set to
100KB, the throughput under memory pressure goes up to ~3.5Mbps.

Change since v2:
- Added description for new field added in struct sock in patch 1
Change since v1:
- Added performance stats in cover letter and rebased
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: adjust rcv_ssthresh according to sk_reserved_mem
Wei Wang [Wed, 29 Sep 2021 17:25:13 +0000 (10:25 -0700)]
tcp: adjust rcv_ssthresh according to sk_reserved_mem

When user sets SO_RESERVE_MEM socket option, in order to utilize the
reserved memory when in memory pressure state, we adjust rcv_ssthresh
according to the available reserved memory for the socket, instead of
using 4 * advmss always.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: adjust sndbuf according to sk_reserved_mem
Wei Wang [Wed, 29 Sep 2021 17:25:12 +0000 (10:25 -0700)]
tcp: adjust sndbuf according to sk_reserved_mem

If user sets SO_RESERVE_MEM socket option, in order to fully utilize the
reserved memory in memory pressure state on the tx path, we modify the
logic in sk_stream_moderate_sndbuf() to set sk_sndbuf according to
available reserved memory, instead of MIN_SOCK_SNDBUF, and adjust it
when new data is acked.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: add new socket option SO_RESERVE_MEM
Wei Wang [Wed, 29 Sep 2021 17:25:11 +0000 (10:25 -0700)]
net: add new socket option SO_RESERVE_MEM

This socket option provides a mechanism for users to reserve a certain
amount of memory for the socket to use. When this option is set, kernel
charges the user specified amount of memory to memcg, as well as
sk_forward_alloc. This amount of memory is not reclaimable and is
available in sk_forward_alloc for this socket.
With this socket option set, the networking stack spends less cycles
doing forward alloc and reclaim, which should lead to better system
performance, with the cost of an amount of pre-allocated and
unreclaimable memory, even under memory pressure.

Note:
This socket option is only available when memory cgroup is enabled and we
require this reserved memory to be charged to the user's memcg. We hope
this could avoid mis-behaving users to abused this feature to reserve a
large amount on certain sockets and cause unfairness for others.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dev_addr_list: handle first address in __hw_addr_add_ex
Jakub Kicinski [Wed, 29 Sep 2021 15:32:24 +0000 (08:32 -0700)]
net: dev_addr_list: handle first address in __hw_addr_add_ex

struct dev_addr_list is used for device addresses, unicast addresses
and multicast addresses. The first of those needs special handling
of the main address - netdev->dev_addr points directly the data
of the entry and drivers write to it freely, so we can't maintain
it in the rbtree (for now, at least, to be fixed in net-next).

Current work around sprinkles special handling of the first
address on the list throughout the code but it missed the case
where address is being added. First address will not be visible
during subsequent adds.

Syzbot found a warning where unicast addresses are modified
without holding the rtnl lock, tl;dr is that team generates
the same modification multiple times, not necessarily when
right locks are held.

In the repro we have:

  macvlan -> team -> veth

macvlan adds a unicast address to the team. Team then pushes
that address down to its memebers (veths). Next something unrelated
makes team sync member addrs again, and because of the bug
the addr entries get duplicated in the veths. macvlan gets
removed, removes its addr from team which removes only one
of the duplicated addresses from veths. This removal is done
under rtnl. Next syzbot uses iptables to add a multicast addr
to team (which does not hold rtnl lock). Team syncs veth addrs,
but because veths' unicast list still has the duplicate it will
also get sync, even though this update is intended for mc addresses.
Again, uc address updates need rtnl lock, boom.

Reported-by: syzbot+7a2ab2cdc14d134de553@syzkaller.appspotmail.com
Fixes: 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: marvell10g: add downshift tunable support
Russell King [Wed, 29 Sep 2021 15:28:53 +0000 (16:28 +0100)]
net: phy: marvell10g: add downshift tunable support

Add support for the downshift tunable for the Marvell 88x3310 PHY.
Downshift is only usable with firmware 0.3.5.0 and later.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: flower: protect fl_walk() with rcu
Vlad Buslov [Wed, 29 Sep 2021 15:08:49 +0000 (18:08 +0300)]
net: sched: flower: protect fl_walk() with rcu

Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul()
also removed rcu protection of individual filters which causes following
use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain
rcu read lock while iterating and taking the filter reference and temporary
release the lock while calling arg->fn() callback that can sleep.

KASAN trace:

[  352.773640] ==================================================================
[  352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower]
[  352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987

[  352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2
[  352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  352.781022] Call Trace:
[  352.781573]  dump_stack_lvl+0x46/0x5a
[  352.782332]  print_address_description.constprop.0+0x1f/0x140
[  352.783400]  ? fl_walk+0x159/0x240 [cls_flower]
[  352.784292]  ? fl_walk+0x159/0x240 [cls_flower]
[  352.785138]  kasan_report.cold+0x83/0xdf
[  352.785851]  ? fl_walk+0x159/0x240 [cls_flower]
[  352.786587]  kasan_check_range+0x145/0x1a0
[  352.787337]  fl_walk+0x159/0x240 [cls_flower]
[  352.788163]  ? fl_put+0x10/0x10 [cls_flower]
[  352.789007]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[  352.790102]  tcf_chain_dump+0x231/0x450
[  352.790878]  ? tcf_chain_tp_delete_empty+0x170/0x170
[  352.791833]  ? __might_sleep+0x2e/0xc0
[  352.792594]  ? tfilter_notify+0x170/0x170
[  352.793400]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[  352.794477]  tc_dump_tfilter+0x385/0x4b0
[  352.795262]  ? tc_new_tfilter+0x1180/0x1180
[  352.796103]  ? __mod_node_page_state+0x1f/0xc0
[  352.796974]  ? __build_skb_around+0x10e/0x130
[  352.797826]  netlink_dump+0x2c0/0x560
[  352.798563]  ? netlink_getsockopt+0x430/0x430
[  352.799433]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[  352.800542]  __netlink_dump_start+0x356/0x440
[  352.801397]  rtnetlink_rcv_msg+0x3ff/0x550
[  352.802190]  ? tc_new_tfilter+0x1180/0x1180
[  352.802872]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
[  352.803668]  ? tc_new_tfilter+0x1180/0x1180
[  352.804344]  ? _copy_from_iter_nocache+0x800/0x800
[  352.805202]  ? kasan_set_track+0x1c/0x30
[  352.805900]  netlink_rcv_skb+0xc6/0x1f0
[  352.806587]  ? rht_deferred_worker+0x6b0/0x6b0
[  352.807455]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
[  352.808324]  ? netlink_ack+0x4d0/0x4d0
[  352.809086]  ? netlink_deliver_tap+0x62/0x3d0
[  352.809951]  netlink_unicast+0x353/0x480
[  352.810744]  ? netlink_attachskb+0x430/0x430
[  352.811586]  ? __alloc_skb+0xd7/0x200
[  352.812349]  netlink_sendmsg+0x396/0x680
[  352.813132]  ? netlink_unicast+0x480/0x480
[  352.813952]  ? __import_iovec+0x192/0x210
[  352.814759]  ? netlink_unicast+0x480/0x480
[  352.815580]  sock_sendmsg+0x6c/0x80
[  352.816299]  ____sys_sendmsg+0x3a5/0x3c0
[  352.817096]  ? kernel_sendmsg+0x30/0x30
[  352.817873]  ? __ia32_sys_recvmmsg+0x150/0x150
[  352.818753]  ___sys_sendmsg+0xd8/0x140
[  352.819518]  ? sendmsg_copy_msghdr+0x110/0x110
[  352.820402]  ? ___sys_recvmsg+0xf4/0x1a0
[  352.821110]  ? __copy_msghdr_from_user+0x260/0x260
[  352.821934]  ? _raw_spin_lock+0x81/0xd0
[  352.822680]  ? __handle_mm_fault+0xef3/0x1b20
[  352.823549]  ? rb_insert_color+0x2a/0x270
[  352.824373]  ? copy_page_range+0x16b0/0x16b0
[  352.825209]  ? perf_event_update_userpage+0x2d0/0x2d0
[  352.826190]  ? __fget_light+0xd9/0xf0
[  352.826941]  __sys_sendmsg+0xb3/0x130
[  352.827613]  ? __sys_sendmsg_sock+0x20/0x20
[  352.828377]  ? do_user_addr_fault+0x2c5/0x8a0
[  352.829184]  ? fpregs_assert_state_consistent+0x52/0x60
[  352.830001]  ? exit_to_user_mode_prepare+0x32/0x160
[  352.830845]  do_syscall_64+0x35/0x80
[  352.831445]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  352.832331] RIP: 0033:0x7f7bee973c17
[  352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[  352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17
[  352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003
[  352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40
[  352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f
[  352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088

[  352.843784] Allocated by task 2960:
[  352.844451]  kasan_save_stack+0x1b/0x40
[  352.845173]  __kasan_kmalloc+0x7c/0x90
[  352.845873]  fl_change+0x282/0x22db [cls_flower]
[  352.846696]  tc_new_tfilter+0x6cf/0x1180
[  352.847493]  rtnetlink_rcv_msg+0x471/0x550
[  352.848323]  netlink_rcv_skb+0xc6/0x1f0
[  352.849097]  netlink_unicast+0x353/0x480
[  352.849886]  netlink_sendmsg+0x396/0x680
[  352.850678]  sock_sendmsg+0x6c/0x80
[  352.851398]  ____sys_sendmsg+0x3a5/0x3c0
[  352.852202]  ___sys_sendmsg+0xd8/0x140
[  352.852967]  __sys_sendmsg+0xb3/0x130
[  352.853718]  do_syscall_64+0x35/0x80
[  352.854457]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[  352.855830] Freed by task 7:
[  352.856421]  kasan_save_stack+0x1b/0x40
[  352.857139]  kasan_set_track+0x1c/0x30
[  352.857854]  kasan_set_free_info+0x20/0x30
[  352.858609]  __kasan_slab_free+0xed/0x130
[  352.859348]  kfree+0xa7/0x3c0
[  352.859951]  process_one_work+0x44d/0x780
[  352.860685]  worker_thread+0x2e2/0x7e0
[  352.861390]  kthread+0x1f4/0x220
[  352.862022]  ret_from_fork+0x1f/0x30

[  352.862955] Last potentially related work creation:
[  352.863758]  kasan_save_stack+0x1b/0x40
[  352.864378]  kasan_record_aux_stack+0xab/0xc0
[  352.865028]  insert_work+0x30/0x160
[  352.865617]  __queue_work+0x351/0x670
[  352.866261]  rcu_work_rcufn+0x30/0x40
[  352.866917]  rcu_core+0x3b2/0xdb0
[  352.867561]  __do_softirq+0xf6/0x386

[  352.868708] Second to last potentially related work creation:
[  352.869779]  kasan_save_stack+0x1b/0x40
[  352.870560]  kasan_record_aux_stack+0xab/0xc0
[  352.871426]  call_rcu+0x5f/0x5c0
[  352.872108]  queue_rcu_work+0x44/0x50
[  352.872855]  __fl_put+0x17c/0x240 [cls_flower]
[  352.873733]  fl_delete+0xc7/0x100 [cls_flower]
[  352.874607]  tc_del_tfilter+0x510/0xb30
[  352.886085]  rtnetlink_rcv_msg+0x471/0x550
[  352.886875]  netlink_rcv_skb+0xc6/0x1f0
[  352.887636]  netlink_unicast+0x353/0x480
[  352.888285]  netlink_sendmsg+0x396/0x680
[  352.888942]  sock_sendmsg+0x6c/0x80
[  352.889583]  ____sys_sendmsg+0x3a5/0x3c0
[  352.890311]  ___sys_sendmsg+0xd8/0x140
[  352.891019]  __sys_sendmsg+0xb3/0x130
[  352.891716]  do_syscall_64+0x35/0x80
[  352.892395]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[  352.893666] The buggy address belongs to the object at ffff8881c8251000
                which belongs to the cache kmalloc-2k of size 2048
[  352.895696] The buggy address is located 1152 bytes inside of
                2048-byte region [ffff8881c8251000ffff8881c8251800)
[  352.897640] The buggy address belongs to the page:
[  352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250
[  352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0
[  352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[  352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00
[  352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
[  352.905861] page dumped because: kasan: bad access detected

[  352.907323] Memory state around the buggy address:
[  352.908218]  ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.909471]  ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.912012]                    ^
[  352.912642]  ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.913919]  ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.915185] ==================================================================

Fixes: d39d714969cd ("idr: introduce idr_for_each_entry_continue_ul()")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Remove redundant initialization of variable pin
Colin Ian King [Wed, 29 Sep 2021 13:27:53 +0000 (14:27 +0100)]
octeontx2-af: Remove redundant initialization of variable pin

The variable pin is being initialized with a value that is never
read, it is being updated later on in only one case of a switch
statement.  The assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: macb: ptp: Switch to gettimex64() interface
Lars-Peter Clausen [Wed, 29 Sep 2021 12:07:39 +0000 (14:07 +0200)]
net: macb: ptp: Switch to gettimex64() interface

The macb PTP support currently implements the `gettime64` callback to allow
to retrieve the hardware clock time. Update the implementation to provide
the `gettimex64` callback instead.

The difference between the two is that with `gettime64` a snapshot of the
system clock is taken before and after invoking the callback. Whereas
`gettimex64` expects the callback itself to take the snapshots.

To get the time from the macb Ethernet core multiple register accesses have
to be done. Only one of which will happen at the time reported by the
function. This leads to a non-symmetric delay and adds a slight offset
between the hardware and system clock time when using the `gettime64`
method. This offset can be a few 100 nanoseconds. Switching to the
`gettimex64` method allows for a more precise correlation of the hardware
and system clocks and results in a lower offset between the two.

On a Xilinx ZynqMP system `phc2sys` reports a delay of 1120 ns before and
300 ns after the patch. With the latter being mostly symmetric.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodissector: do not set invalid PPP protocol
Boris Sukholitko [Wed, 29 Sep 2021 11:32:23 +0000 (14:32 +0300)]
dissector: do not set invalid PPP protocol

The following flower filter fails to match non-PPP_IP{V6} packets
wrapped in PPP_SES protocol:

tc filter add dev eth0 ingress protocol ppp_ses flower \
        action simple sdata hi64

The reason is that proto local variable is being set even when
FLOW_DISSECT_RET_OUT_BAD status is returned.

The fix is to avoid setting proto variable if the PPP protocol is unknown.

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: rtl8366rb: Use core filtering tracking
Linus Walleij [Wed, 29 Sep 2021 11:23:22 +0000 (13:23 +0200)]
net: dsa: rtl8366rb: Use core filtering tracking

We added a state variable to track whether a certain port
was VLAN filtering or not, but we can just inquire the DSA
core about this.

Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Alvin Å ipraga <alsi@bang-olufsen.dk>
Cc: Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: introduce and use lock_sock_fast_nested()
Paolo Abeni [Wed, 29 Sep 2021 09:59:17 +0000 (11:59 +0200)]
net: introduce and use lock_sock_fast_nested()

Syzkaller reported a false positive deadlock involving
the nl socket lock and the subflow socket lock:

MPTCP: kernel_bind error, err=-98
============================================
WARNING: possible recursive locking detected
5.15.0-rc1-syzkaller #0 Not tainted
--------------------------------------------
syz-executor998/6520 is trying to acquire lock:
ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738

but task is already holding lock:
ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(k-sk_lock-AF_INET);
  lock(k-sk_lock-AF_INET);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by syz-executor998/6520:
 #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802
 #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
 #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790
 #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
 #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720

stack backtrace:
CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_deadlock_bug kernel/locking/lockdep.c:2944 [inline]
 check_deadlock kernel/locking/lockdep.c:2987 [inline]
 validate_chain kernel/locking/lockdep.c:3776 [inline]
 __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015
 lock_acquire kernel/locking/lockdep.c:5625 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
 lock_sock_fast+0x36/0x100 net/core/sock.c:3229
 mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
 __sock_release net/socket.c:649 [inline]
 sock_release+0x87/0x1b0 net/socket.c:677
 mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900
 mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170
 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
 genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 sock_no_sendpage+0x101/0x150 net/core/sock.c:2980
 kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504
 kernel_sendpage net/socket.c:3501 [inline]
 sock_sendpage+0xe5/0x140 net/socket.c:1003
 pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
 splice_from_pipe_feed fs/splice.c:418 [inline]
 __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
 splice_from_pipe fs/splice.c:597 [inline]
 generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
 do_splice_from fs/splice.c:767 [inline]
 direct_splice_actor+0x110/0x180 fs/splice.c:936
 splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891
 do_splice_direct+0x1b3/0x280 fs/splice.c:979
 do_sendfile+0xae9/0x1240 fs/read_write.c:1249
 __do_sys_sendfile64 fs/read_write.c:1314 [inline]
 __se_sys_sendfile64 fs/read_write.c:1300 [inline]
 __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f215cb69969
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08
R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c
R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000

the problem originates from uncorrect lock annotation in the mptcp
code and is only visible since commit 2dcb96bacce3 ("net: core: Correct
the sock::sk_lock.owned lockdep annotations"), but is present since
the port-based endpoint support initial implementation.

This patch addresses the issue introducing a nested variant of
lock_sock_fast() and using it in the relevant code path.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Fixes: 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-pf: Add XDP support to netdev PF
Geetha sowjanya [Wed, 29 Sep 2021 09:54:56 +0000 (15:24 +0530)]
octeontx2-pf: Add XDP support to netdev PF

Adds XDP_PASS, XDP_TX, XDP_DROP and XDP_REDIRECT support
for netdev PF.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Adjust LA pointer for cpt parse header
Kiran Kumar K [Wed, 29 Sep 2021 05:58:31 +0000 (11:28 +0530)]
octeontx2-af: Adjust LA pointer for cpt parse header

In case of ltype NPC_LT_LA_CPT_HDR, LA pointer is pointing to the
start of cpt parse header. Since cpt parse header has veriable
length padding, this will be a problem for DMAC extraction. Adding
KPU profile changes to adjust the LA pointer to start at ether header
in case of cpt parse header by
   - Adding ptr advance in pkind 58 to a fixed value 40
   - Adding variable length offset 7 and mask 7 (pad len in
     CPT_PARSE_HDR).
Also added the missing static declaration for npc_set_var_len_offset_pkind
function.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet_sched: Use struct_size() and flex_array_size() helpers
Gustavo A. R. Silva [Tue, 28 Sep 2021 19:31:07 +0000 (14:31 -0500)]
net_sched: Use struct_size() and flex_array_size() helpers

Make use of the struct_size() and flex_array_size() helpers instead of
an open-coded version, in order to avoid any potential type mistakes
or integer overflows that, in the worse scenario, could lead to heap
overflows.

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20210928193107.GA262595@embeddedor
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer
Mun Yew Tham [Wed, 29 Sep 2021 00:49:11 +0000 (08:49 +0800)]
MAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer

Update Altera Pio Driver maintainer's email from <joyce.ooi@intel.com> to <mun.yew.tham@intel.com>

Signed-off-by: Mun Yew Tham <mun.yew.tham@intel.com>
Acked-by: Joyce Ooi <joyce.ooi@intel.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2 years agoMAINTAINERS: update my email address
Bartosz Golaszewski [Mon, 20 Sep 2021 07:18:37 +0000 (09:18 +0200)]
MAINTAINERS: update my email address

My professional situation changes soon. Update my email address.

Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2 years agogpio: pca953x: do not ignore i2c errors
Andrey Gusakov [Thu, 23 Sep 2021 17:22:16 +0000 (20:22 +0300)]
gpio: pca953x: do not ignore i2c errors

Per gpio_chip interface, error shall be proparated to the caller.

Attempt to silent diagnostics by returning zero (as written in the
comment) is plain wrong, because the zero return can be interpreted by
the caller as the gpio value.

Cc: stable@vger.kernel.org
Signed-off-by: Andrey Gusakov <andrey.gusakov@cogentembedded.com>
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2 years agodevlink: Add missed notifications iterators
Leon Romanovsky [Wed, 29 Sep 2021 14:18:20 +0000 (17:18 +0300)]
devlink: Add missed notifications iterators

The commit mentioned in Fixes line missed a couple of notifications that
were registered before devlink_register() and should be delayed too.

As such, the too early placed WARN_ON() check spotted it.

WARNING: CPU: 1 PID: 6540 at net/core/devlink.c:5158 devlink_nl_region_notify+0x184/0x1e0 net/core/devlink.c:5158
Modules linked in:
CPU: 1 PID: 6540 Comm: syz-executor.0 Not tainted 5.15.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:devlink_nl_region_notify+0x184/0x1e0 net/core/devlink.c:5158
Code: 38 41 b8 c0 0c 00 00 31 d2 48 89 ee 4c 89 e7 e8 72 1a 26 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e e9 01 bd 41 fa
e8 fc bc 41 fa <0f> 0b e9 f7 fe ff ff e8 f0 bc 41 fa 0f 0b eb da 4c 89 e7 e8 c4 18
RSP: 0018:ffffc90002d6f658 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88801f08d580 RSI: ffffffff87344e94 RDI: 0000000000000003
RBP: ffff88801ee42100 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff87344d8a R11: 0000000000000000 R12: ffff88801c1dc000
R13: 0000000000000000 R14: 000000000000002c R15: ffff88801c1dc070
FS:  0000555555e8e400(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055dd7c590310 CR3: 0000000069a09000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 devlink_region_create+0x39f/0x4c0 net/core/devlink.c:10327
 nsim_dev_dummy_region_init drivers/net/netdevsim/dev.c:481 [inline]
 nsim_dev_probe+0x5f6/0x1150 drivers/net/netdevsim/dev.c:1479
 call_driver_probe drivers/base/dd.c:517 [inline]
 really_probe+0x245/0xcc0 drivers/base/dd.c:596
 __driver_probe_device+0x338/0x4d0 drivers/base/dd.c:751
 driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:781
 __device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:898
 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427
 __device_attach+0x228/0x4a0 drivers/base/dd.c:969
 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487
 device_add+0xc35/0x21b0 drivers/base/core.c:3359
 nsim_bus_dev_new drivers/net/netdevsim/bus.c:435 [inline]
 new_device_store+0x48b/0x770 drivers/net/netdevsim/bus.c:302
 bus_attr_store+0x72/0xa0 drivers/base/bus.c:122
 sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
 kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
 call_write_iter include/linux/fs.h:2163 [inline]
 new_sync_write+0x429/0x660 fs/read_write.c:507
 vfs_write+0x7cf/0xae0 fs/read_write.c:594
 ksys_write+0x12d/0x250 fs/read_write.c:647
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f328409d3ef
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01
00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48
RSP: 002b:00007ffdc6851140 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f328409d3ef
RDX: 0000000000000003 RSI: 00007ffdc6851190 RDI: 0000000000000004
RBP: 0000000000000004 R08: 0000000000000000 R09: 00007ffdc68510e0
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3284144971
R13: 00007ffdc6851190 R14: 0000000000000000 R15: 00007ffdc6851860

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/2ed1159291f2a589b013914f2b60d8172fc525c1.1632925030.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Wed, 29 Sep 2021 14:48:00 +0000 (07:48 -0700)]
Merge tag 'sound-5.15-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "This became a slightly large collection of changes, partly because
  I've been off in the last weeks. Most of changes are small and
  scattered while a bit big change is found in HD-audio Realtek codec
  driver; it's a very device-specific fix that has been long wanted, so
  I decided to pick up although it's in the middle RC.

  Some highlights:

   - A new guard ioctl for ALSA rawmidi API to avoid the misuse of the
     new timestamp framing mode; it's for a regression fix

   - HD-audio: a revert of the 5.15 change that might work badly, new
     quirks for Lenovo Legion & co, a follow-up fix for CS8409

   - ASoC: lots of SOF-related fixes, fsl component fixes, corrections
     of mediatek drivers

   - USB-audio: fix for the PM resume

   - FireWire: oxfw and motu fixes"

* tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
  ALSA: pcsp: Make hrtimer forwarding more robust
  ALSA: rawmidi: introduce SNDRV_RAWMIDI_IOCTL_USER_PVERSION
  ALSA: firewire-motu: fix truncated bytes in message tracepoints
  ASoC: SOF: trace: Omit error print when waking up trace sleepers
  ASoC: mediatek: mt8195: remove wrong fixup assignment on HDMITX
  ASoC: SOF: loader: Re-phrase the missing firmware error to avoid duplication
  ASoC: SOF: loader: release_firmware() on load failure to avoid batching
  ALSA: hda/cs8409: Setup Dolphin Headset Mic as Phantom Jack
  ALSA: pcxhr: "fix" PCXHR_REG_TO_PORT definition
  ASoC: SOF: imx: imx8m: Bar index is only valid for IRAM and SRAM types
  ASoC: SOF: imx: imx8: Bar index is only valid for IRAM and SRAM types
  ASoC: SOF: Fix DSP oops stack dump output contents
  ALSA: hda/realtek: Quirks to enable speaker output for Lenovo Legion 7i 15IMHG05, Yoga 7i 14ITL5/15ITL5, and 13s Gen2 laptops.
  ALSA: usb-audio: Unify mixer resume and reset_resume procedure
  Revert "ALSA: hda: Drop workaround for a hang at shutdown again"
  ALSA: oxfw: fix transmission method for Loud models based on OXFW971
  ASoC: mediatek: common: handle NULL case in suspend/resume function
  ASoC: fsl_xcvr: register platform component before registering cpu dai
  ASoC: fsl_spdif: register platform component before registering cpu dai
  ASoC: fsl_micfil: register platform component before registering cpu dai
  ...

2 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 29 Sep 2021 14:37:46 +0000 (07:37 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This contains fixes for a resource leak in ccp as well as stack
  corruption in x86/sm4"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/sm4 - Fix frame pointer stack corruption
  crypto: ccp - fix resource leaks in ccp_run_aes_gcm_cmd()

2 years agogve: Use kvcalloc() instead of kvzalloc()
Gustavo A. R. Silva [Tue, 28 Sep 2021 21:38:05 +0000 (16:38 -0500)]
gve: Use kvcalloc() instead of kvzalloc()

Use 2-factor argument form kvcalloc() instead of kvzalloc().

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/ipv4/datagram.c: remove superfluous header files from datagram.c
Mianhan Liu [Wed, 29 Sep 2021 05:31:09 +0000 (13:31 +0800)]
net/ipv4/datagram.c: remove superfluous header files from datagram.c

datagram.c hasn't use any macro or function declared in linux/ip.h.
Thus, these files can be removed from datagram.c safely without
affecting the compilation of the net/ipv4 module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/dsa/tag_ksz.c: remove superfluous headers
Mianhan Liu [Wed, 29 Sep 2021 06:41:06 +0000 (14:41 +0800)]
net/dsa/tag_ksz.c: remove superfluous headers

tag_ksz.c hasn't use any macro or function declared in linux/slab.h.
Thus, these files can be removed from tag_ksz.c safely without
affecting the compilation of the ./net/dsa module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/dsa/tag_8021q.c: remove superfluous headers
Mianhan Liu [Wed, 29 Sep 2021 06:36:11 +0000 (14:36 +0800)]
net/dsa/tag_8021q.c: remove superfluous headers

tag_8021q.c hasn't use any macro or function declared in linux/if_bridge.h.
Thus, these files can be removed from tag_8021q.c safely without
affecting the compilation of the ./net/dsa module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: bcm7xxx: Fixed indirect MMD operations
Florian Fainelli [Tue, 28 Sep 2021 20:32:33 +0000 (13:32 -0700)]
net: phy: bcm7xxx: Fixed indirect MMD operations

When EEE support was added to the 28nm EPHY it was assumed that it would
be able to support the standard clause 45 over clause 22 register access
method. It turns out that the PHY does not support that, which is the
very reason for using the indirect shadow mode 2 bank 3 access method.

Implement {read,write}_mmd to allow the standard PHY library routines
pertaining to EEE querying and configuration to work correctly on these
PHYs. This forces us to implement a __phy_set_clr_bits() function that
does not grab the MDIO bus lock since the PHY driver's {read,write}_mmd
functions are always called with that lock held.

Fixes: 83ee102a6998 ("net: phy: bcm7xxx: add support for 28nm EPHY")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/mlx4: Use array_size() helper in copy_to_user()
Gustavo A. R. Silva [Tue, 28 Sep 2021 20:17:33 +0000 (15:17 -0500)]
net/mlx4: Use array_size() helper in copy_to_user()

Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bridge: Use array_size() helper in copy_to_user()
Gustavo A. R. Silva [Tue, 28 Sep 2021 20:12:39 +0000 (15:12 -0500)]
net: bridge: Use array_size() helper in copy_to_user()

Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoethtool: ioctl: Use array_size() helper in copy_{from,to}_user()
Gustavo A. R. Silva [Tue, 28 Sep 2021 19:57:35 +0000 (14:57 -0500)]
ethtool: ioctl: Use array_size() helper in copy_{from,to}_user()

Use array_size() helper instead of the open-coded version in
copy_{from,to}_user().  These sorts of multiplication factors
need to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'hns3-fixes'
David S. Miller [Wed, 29 Sep 2021 10:03:54 +0000 (11:03 +0100)]
Merge branch 'hns3-fixes'

Guangbin Huang says:

====================
net: hns3: add some fixes for -net

This series adds some fixes for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: disable firmware compatible features when uninstall PF
Guangbin Huang [Wed, 29 Sep 2021 09:35:56 +0000 (17:35 +0800)]
net: hns3: disable firmware compatible features when uninstall PF

Currently, the firmware compatible features are enabled in PF driver
initialization process, but they are not disabled in PF driver
deinitialization process and firmware keeps these features in enabled
status.

In this case, if load an old PF driver (for example, in VM) which not
support the firmware compatible features, firmware will still send mailbox
message to PF when link status changed and PF will print
"un-supported mailbox message, code = 201".

To fix this problem, disable these firmware compatible features in PF
driver deinitialization process.

Fixes: ed8fb4b262ae ("net: hns3: add link change event report")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: fix always enable rx vlan filter problem after selftest
Guangbin Huang [Wed, 29 Sep 2021 09:35:55 +0000 (17:35 +0800)]
net: hns3: fix always enable rx vlan filter problem after selftest

Currently, the rx vlan filter will always be disabled before selftest and
be enabled after selftest as the rx vlan filter feature is fixed on in
old device earlier than V3.

However, this feature is not fixed in some new devices and it can be
disabled by user. In this case, it is wrong if rx vlan filter is enabled
after selftest. So fix it.

Fixes: bcc26e8dc432 ("net: hns3: remove unused code in hns3_self_test()")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: PF enable promisc for VF when mac table is overflow
Guangbin Huang [Wed, 29 Sep 2021 09:35:54 +0000 (17:35 +0800)]
net: hns3: PF enable promisc for VF when mac table is overflow

If unicast mac address table is full, and user add a new mac address, the
unicast promisc needs to be enabled for the new unicast mac address can be
used. So does the multicast promisc.

Now this feature has been implemented for PF, and VF should be implemented
too. When the mac table of VF is overflow, PF will enable promisc for this
VF.

Fixes: 1e6e76101fd9 ("net: hns3: configure promisc mode for VF asynchronously")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: fix show wrong state when add existing uc mac address
Jian Shen [Wed, 29 Sep 2021 09:35:53 +0000 (17:35 +0800)]
net: hns3: fix show wrong state when add existing uc mac address

Currently, if function adds an existing unicast mac address, eventhough
driver will not add this address into hardware, but it will return 0 in
function hclge_add_uc_addr_common(). It will cause the state of this
unicast mac address is ACTIVE in driver, but it should be in TO-ADD state.

To fix this problem, function hclge_add_uc_addr_common() returns -EEXIST
if mac address is existing, and delete two error log to avoid printing
them all the time after this modification.

Fixes: 72110b567479 ("net: hns3: return 0 and print warning when hit duplicate MAC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
Jian Shen [Wed, 29 Sep 2021 09:35:52 +0000 (17:35 +0800)]
net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE

HCLGE_FLAG_MQPRIO_ENABLE is supposed to set when enable
multiple TCs with tc mqprio, and HCLGE_FLAG_DCB_ENABLE is
supposed to set when enable multiple TCs with ets. But
the driver mixed the flags when updating the tm configuration.

Furtherly, PFC should be available when HCLGE_FLAG_MQPRIO_ENABLE
too, so remove the unnecessary limitation.

Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: don't rollback when destroy mqprio fail
Jian Shen [Wed, 29 Sep 2021 09:35:51 +0000 (17:35 +0800)]
net: hns3: don't rollback when destroy mqprio fail

For destroy mqprio is irreversible in stack, so it's unnecessary
to rollback the tc configuration when destroy mqprio failed.
Otherwise, it may cause the configuration being inconsistent
between driver and netstack.

As the failure is usually caused by reset, and the driver will
restore the configuration after reset, so it can keep the
configuration being consistent between driver and hardware.

Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: remove tc enable checking
Jian Shen [Wed, 29 Sep 2021 09:35:50 +0000 (17:35 +0800)]
net: hns3: remove tc enable checking

Currently, in function hns3_nic_set_real_num_queue(), the
driver doesn't report the queue count and offset for disabled
tc. If user enables multiple TCs, but only maps user
priorities to partial of them, it may cause the queue range
of the unmapped TC being displayed abnormally.

Fix it by removing the tc enable checking, ensure the queue
count is not zero.

With this change, the tc_en is useless now, so remove it.

Fixes: a75a8efa00c5 ("net: hns3: Fix tc setup when netdev is first up")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: do not allow call hns3_nic_net_open repeatedly
Jian Shen [Wed, 29 Sep 2021 09:35:49 +0000 (17:35 +0800)]
net: hns3: do not allow call hns3_nic_net_open repeatedly

hns3_nic_net_open() is not allowed to called repeatly, but there
is no checking for this. When doing device reset and setup tc
concurrently, there is a small oppotunity to call hns3_nic_net_open
repeatedly, and cause kernel bug by calling napi_enable twice.

The calltrace information is like below:
[ 3078.222780] ------------[ cut here ]------------
[ 3078.230255] kernel BUG at net/core/dev.c:6991!
[ 3078.236224] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 3078.243431] Modules linked in: hns3 hclgevf hclge hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O)
[ 3078.258880] CPU: 0 PID: 295 Comm: kworker/u8:5 Tainted: G           O      5.14.0-rc4+ #1
[ 3078.269102] Hardware name:  , BIOS KpxxxFPGA 1P B600 V181 08/12/2021
[ 3078.276801] Workqueue: hclge hclge_service_task [hclge]
[ 3078.288774] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 3078.296168] pc : napi_enable+0x80/0x84
tc qdisc sho[w  3d0e7v8 .e3t0h218 79] lr : hns3_nic_net_open+0x138/0x510 [hns3]

[ 3078.314771] sp : ffff8000108abb20
[ 3078.319099] x29: ffff8000108abb20 x28: 0000000000000000 x27: ffff0820a8490300
[ 3078.329121] x26: 0000000000000001 x25: ffff08209cfc6200 x24: 0000000000000000
[ 3078.339044] x23: ffff0820a8490300 x22: ffff08209cd76000 x21: ffff0820abfe3880
[ 3078.349018] x20: 0000000000000000 x19: ffff08209cd76900 x18: 0000000000000000
[ 3078.358620] x17: 0000000000000000 x16: ffffc816e1727a50 x15: 0000ffff8f4ff930
[ 3078.368895] x14: 0000000000000000 x13: 0000000000000000 x12: 0000259e9dbeb6b4
[ 3078.377987] x11: 0096a8f7e764eb40 x10: 634615ad28d3eab5 x9 : ffffc816ad8885b8
[ 3078.387091] x8 : ffff08209cfc6fb8 x7 : ffff0820ac0da058 x6 : ffff0820a8490344
[ 3078.396356] x5 : 0000000000000140 x4 : 0000000000000003 x3 : ffff08209cd76938
[ 3078.405365] x2 : 0000000000000000 x1 : 0000000000000010 x0 : ffff0820abfe38a0
[ 3078.414657] Call trace:
[ 3078.418517]  napi_enable+0x80/0x84
[ 3078.424626]  hns3_reset_notify_up_enet+0x78/0xd0 [hns3]
[ 3078.433469]  hns3_reset_notify+0x64/0x80 [hns3]
[ 3078.441430]  hclge_notify_client+0x68/0xb0 [hclge]
[ 3078.450511]  hclge_reset_rebuild+0x524/0x884 [hclge]
[ 3078.458879]  hclge_reset_service_task+0x3c4/0x680 [hclge]
[ 3078.467470]  hclge_service_task+0xb0/0xb54 [hclge]
[ 3078.475675]  process_one_work+0x1dc/0x48c
[ 3078.481888]  worker_thread+0x15c/0x464
[ 3078.487104]  kthread+0x160/0x170
[ 3078.492479]  ret_from_fork+0x10/0x18
[ 3078.498785] Code: c8027c81 35ffffa2 d50323bf d65f03c0 (d4210000)
[ 3078.506889] ---[ end trace 8ebe0340a1b0fb44 ]---

Once hns3_nic_net_open() is excute success, the flag
HNS3_NIC_STATE_DOWN will be cleared. So add checking for this
flag, directly return when HNS3_NIC_STATE_DOWN is no set.

Fixes: e888402789b9 ("net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mctp-core-updates'
David S. Miller [Wed, 29 Sep 2021 10:00:12 +0000 (11:00 +0100)]
Merge branch 'mctp-core-updates'

Matt Johnston says:

====================
Updates to MCTP core

This series adds timeouts for MCTP tags (a limited resource), and a few
other improvements to the MCTP core.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Warn if pointer is set for a wrong dev type
Matt Johnston [Wed, 29 Sep 2021 07:26:14 +0000 (15:26 +0800)]
mctp: Warn if pointer is set for a wrong dev type

Should not occur but is a sanity check.

May help tracking down Trinity reported issue
https://lore.kernel.org/lkml/20210913030701.GA5926@xsang-OptiPlex-9020/

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Set route MTU via netlink
Matt Johnston [Wed, 29 Sep 2021 07:26:13 +0000 (15:26 +0800)]
mctp: Set route MTU via netlink

A route's RTAX_MTU can be set in nested RTAX_METRICS

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodoc/mctp: Add a little detail about kernel internals
Jeremy Kerr [Wed, 29 Sep 2021 07:26:12 +0000 (15:26 +0800)]
doc/mctp: Add a little detail about kernel internals

Describe common flows and refcounting behaviour.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Do inits as a subsys_initcall
Jeremy Kerr [Wed, 29 Sep 2021 07:26:11 +0000 (15:26 +0800)]
mctp: Do inits as a subsys_initcall

In a future change, we'll want to provide a registration call for
mctp-specific devices. This requires us to have the networks established
before device driver inits, so run the core init as a subsys_initcall.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add tracepoints for tag/key handling
Jeremy Kerr [Wed, 29 Sep 2021 07:26:10 +0000 (15:26 +0800)]
mctp: Add tracepoints for tag/key handling

The tag allocation, release and bind events are somewhat opaque outside
the kernel; this change adds a few tracepoints to assist in
instrumentation and debugging.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Implement a timeout for tags
Jeremy Kerr [Wed, 29 Sep 2021 07:26:09 +0000 (15:26 +0800)]
mctp: Implement a timeout for tags

Currently, a MCTP (local-eid,remote-eid,tag) tuple is allocated to a
socket on send, and only expires when the socket is closed.

This change introduces a tag timeout, freeing the tuple after a fixed
expiry - currently six seconds. This is greater than (but close to) the
max response timeout in upper-layer bindings.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add refcounts to mctp_dev
Jeremy Kerr [Wed, 29 Sep 2021 07:26:08 +0000 (15:26 +0800)]
mctp: Add refcounts to mctp_dev

Currently, we tie the struct mctp_dev lifetime to the underlying struct
net_device, and hold/put that device as a proxy for a separate mctp_dev
refcount. This works because we're not holding any references to the
mctp_dev that are different from the netdev lifetime.

In a future change we'll break that assumption though, as we'll need to
hold mctp_dev references in a workqueue, which might live past the
netdev unregister notification.

In order to support that, this change introduces a refcount on the
mctp_dev, currently taken by the net_device->mctp_ptr reference, and
released on netdev unregister events. We can then use this for future
references that might outlast the net device.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: locking, lifetime and validity changes for sk_keys
Jeremy Kerr [Wed, 29 Sep 2021 07:26:07 +0000 (15:26 +0800)]
mctp: locking, lifetime and validity changes for sk_keys

We will want to invalidate sk_keys in a future change, which will
require a boolean flag to mark invalidated items in the socket & net
namespace lists. We'll also need to take a reference to keys, held over
non-atomic contexts, so we need a refcount on keys also.

This change adds a validity flag (currently always true) and refcount to
struct mctp_sk_key.  With a refcount on the keys, using RCU no longer
makes much sense; we have exact indications on the lifetime of keys. So,
we also change the RCU list traversal to a locked implementation.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Allow local delivery to the null EID
Jeremy Kerr [Wed, 29 Sep 2021 07:26:06 +0000 (15:26 +0800)]
mctp: Allow local delivery to the null EID

We may need to receive packets addressed to the null EID (==0), but
addressed to us at the physical layer.

This change adds a lookup for local routes when we see a packet
addressed to EID 0, and a local phys address.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Allow MCTP on tun devices
Matt Johnston [Wed, 29 Sep 2021 07:26:05 +0000 (15:26 +0800)]
mctp: Allow MCTP on tun devices

Allowing TUN is useful for testing, to route packets to userspace or to
tunnel between machines.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: micrel: Add support for LAN8804 PHY
Horatiu Vultur [Tue, 28 Sep 2021 18:45:19 +0000 (20:45 +0200)]
net: phy: micrel: Add support for LAN8804 PHY

The LAN8804 PHY has same features as that of LAN8814 PHY except that it
doesn't support 1588, SyncE or Q-USGMII.

This PHY is found inside the LAN966X switches.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
Feng Zhou [Tue, 28 Sep 2021 22:23:59 +0000 (15:23 -0700)]
ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup

The ixgbe driver currently generates a NULL pointer dereference with
some machine (online cpus < 63). This is due to the fact that the
maximum value of num_xdp_queues is nr_cpu_ids. Code is in
"ixgbe_set_rss_queues"".

Here's how the problem repeats itself:
Some machine (online cpus < 63), And user set num_queues to 63 through
ethtool. Code is in the "ixgbe_set_channels",
adapter->ring_feature[RING_F_FDIR].limit = count;

It becomes 63.

When user use xdp, "ixgbe_set_rss_queues" will set queues num.
adapter->num_rx_queues = rss_i;
adapter->num_tx_queues = rss_i;
adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);

And rss_i's value is from
f = &adapter->ring_feature[RING_F_FDIR];
rss_i = f->indices = f->limit;

So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup",
for (i = 0; i < adapter->num_rx_queues; i++)
if (adapter->xdp_ring[i]->xsk_umem)

It leads to panic.

Call trace:
[exception RIP: ixgbe_xdp+368]
RIP: ffffffffc02a76a0  RSP: ffff9fe16202f8d0  RFLAGS: 00010297
RAX: 0000000000000000  RBX: 0000000000000020  RCX: 0000000000000000
RDX: 0000000000000000  RSI: 000000000000001c  RDI: ffffffffa94ead90
RBP: ffff92f8f24c0c18   R8: 0000000000000000   R9: 0000000000000000
R10: ffff9fe16202f830  R11: 0000000000000000  R12: ffff92f8f24c0000
R13: ffff9fe16202fc01  R14: 000000000000000a  R15: ffffffffc02a7530
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 7 [ffff9fe16202f8f0] dev_xdp_install at ffffffffa89fbbcc
 8 [ffff9fe16202f920] dev_change_xdp_fd at ffffffffa8a08808
 9 [ffff9fe16202f960] do_setlink at ffffffffa8a20235
10 [ffff9fe16202fa88] rtnl_setlink at ffffffffa8a20384
11 [ffff9fe16202fc78] rtnetlink_rcv_msg at ffffffffa8a1a8dd
12 [ffff9fe16202fcf0] netlink_rcv_skb at ffffffffa8a717eb
13 [ffff9fe16202fd40] netlink_unicast at ffffffffa8a70f88
14 [ffff9fe16202fd80] netlink_sendmsg at ffffffffa8a71319
15 [ffff9fe16202fdf0] sock_sendmsg at ffffffffa89df290
16 [ffff9fe16202fe08] __sys_sendto at ffffffffa89e19c8
17 [ffff9fe16202ff30] __x64_sys_sendto at ffffffffa89e1a64
18 [ffff9fe16202ff38] do_syscall_64 at ffffffffa84042b9
19 [ffff9fe16202ff50] entry_SYSCALL_64_after_hwframe at ffffffffa8c0008c

So I fix ixgbe_max_channels so that it will not allow a setting of queues
to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup,
take the smaller value of num_rx_queues and num_xdp_queues.

Fixes: 4a9b32f30f80 ("ixgbe: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
David S. Miller [Wed, 29 Sep 2021 09:30:56 +0000 (10:30 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-09-28

This series contains updates to ice driver only.

Dave adds support for QoS DSCP allowing for DSCP to TC mapping via APP
TLVs.

Ani adds enforcement of DSCP to only supported devices with the
introduction of a feature bitmap and corrects messaging of unsupported
modules based on link mode.

Jake refactors devlink info functions to be void as the functions no
longer return errors.

Jeff fixes a macro name to properly reflect the value.

Len Baker converts a kzalloc allocation to, the preferred, kcalloc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'octeontx2-ptp-vf'
David S. Miller [Wed, 29 Sep 2021 09:27:33 +0000 (10:27 +0100)]
Merge branch 'octeontx2-ptp-vf'

Subbaraya Sundeep <sbhatta@marvell.com>

====================
octeontx2: Add PTP support for VFs

PTP is a shared hardware block which can prepend
RX timestamps to packets before directing packets to
PFs or VFs and can notify the TX timestamps to PFs or VFs
via TX completion queue descriptors. Hence adding PTP
support for VFs is exactly similar to PFs with minimal changes.
This patchset adds that PTP support for VFs.

Patch 1 - When an interface is set in promisc/multicast
the same setting is not retained when changing mtu or channels.
This is due to toggling of the interface by driver but not
calling set_rx_mode in the down-up sequence. Since setting
an interface to multicast properly is required for ptp this is
addressed in this patch.

Patch 2 - Changes in VF driver for registering timestamping
ethtool ops and ndo_ioctl.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-nicvf: Add PTP hardware clock support to NIX VF
Naveen Mamindlapalli [Tue, 28 Sep 2021 17:43:46 +0000 (23:13 +0530)]
octeontx2-nicvf: Add PTP hardware clock support to NIX VF

This patch adds PTP PHC support to NIX VF interfaces. This enables
a VF to run PTP master/slave instance. PTP block being a shared
hardware resource it is recommended to avoid running multiple
PTP instances in the system which will impact the PTP clock
accuracy.

Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-pf: Enable promisc/allmulti match MCAM entries.
Rakesh Babu [Tue, 28 Sep 2021 17:43:45 +0000 (23:13 +0530)]
octeontx2-pf: Enable promisc/allmulti match MCAM entries.

Whenever the interface is brought up/down then set_rx_mode
function is called by the stack which enables promisc/allmulti
MCAM entries. But there are cases when driver brings
interface down and then up such as while changing number
of channels. In these cases promisc/allmulti MCAM entries
are left disabled as set_rx_mode callback is not called.
This patch enables these MCAM entries in all such cases.

Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: qrtr: combine nameservice into main module
Luca Weiss [Tue, 28 Sep 2021 17:11:57 +0000 (19:11 +0200)]
net: qrtr: combine nameservice into main module

Previously with CONFIG_QRTR=m a separate ns.ko would be built which
wasn't done on purpose and should be included in qrtr.ko.

Rename qrtr.c to af_qrtr.c so we can build a qrtr.ko with both af_qrtr.c
and ns.c.

Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-By: Steev Klimaszewski <steev@kali.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20210928171156.6353-1-luca@z3ntu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipv4: remove superfluous header files from fib_notifier.c
Mianhan Liu [Tue, 28 Sep 2021 16:40:11 +0000 (00:40 +0800)]
net: ipv4: remove superfluous header files from fib_notifier.c

fib_notifier.c hasn't use any macro or function declared
in net/netns/ipv4.h.

Thus, these files can be removed from fib_notifier.c safely
without affecting the compilation of the net/ipv4 module.

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Link: https://lore.kernel.org/r/20210928164011.1454-1-liumh1@shanghaitech.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: bridge: mcast: Associate the seqcount with its protecting lock.
Thomas Gleixner [Tue, 28 Sep 2021 14:10:49 +0000 (16:10 +0200)]
net: bridge: mcast: Associate the seqcount with its protecting lock.

The sequence count bridge_mcast_querier::seq is protected by
net_bridge::multicast_lock but seqcount_init() does not associate the
seqcount with the lock. This leads to a warning on PREEMPT_RT because
preemption is still enabled.

Let seqcount_init() associate the seqcount with lock that protects the
write section. Remove lockdep_assert_held_once() because lockdep already checks
whether the associated lock is held.

Fixes: 67b746f94ff39 ("net: bridge: mcast: make sure querier port/address updates are consistent")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20210928141049.593833-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mdio-ipq4019: Fix the error for an optional regs resource
Cai Huoqing [Tue, 28 Sep 2021 13:48:49 +0000 (21:48 +0800)]
net: mdio-ipq4019: Fix the error for an optional regs resource

The second resource is optional which is only provided on the chipset
IPQ5018. But the blamed commit ignores that and if the resource is
not there it just fails.

the resource is used like this,
if (priv->eth_ldo_rdy) {
val = readl(priv->eth_ldo_rdy);
val |= BIT(0);
writel(val, priv->eth_ldo_rdy);
fsleep(IPQ_PHY_SET_DELAY_US);
}

This patch reverts that to still allow the second resource to be optional
because other SoC have the some MDIO controller and doesn't need to
second resource.

Fixes: fa14d03e014a ("net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()")
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210928134849.2092-1-caihuoqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'pinctrl-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Tue, 28 Sep 2021 23:10:42 +0000 (16:10 -0700)]
Merge tag 'pinctrl-v5.15-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Some few pin control fixes for the v5.15 kernel cycle. The most
  critical is the AMD fixes.

   - Fix wakeup interrupts in the AMD driver affecting AMD laptops.

   - Fix parent irqspec translation in the Qualcomm SPMI GPIO driver.

   - Fix deferred probe handling in the Rockchip driver, this is a
     stopgap solution while we look for something more elegant.

   - Add PM suspend callbacks to the Qualcomm SC7280 driver.

   - Some minor doc fix (should have come in earlier, sorry)"

* tag 'pinctrl-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: qcom: sc7280: Add PM suspend callbacks
  gpio/rockchip: fetch deferred output settings on probe
  pinctrl/rockchip: add a queue for deferred pin output settings on probe
  pinctrl: qcom: spmi-gpio: correct parent irqspec translation
  pinctrl: amd: Handle wake-up interrupt
  pinctrl: amd: Add irq field data
  pinctrl: core: Remove duplicated word from devm_pinctrl_unregister()

2 years agoMerge tag 'vfio-v5.15-rc4' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Tue, 28 Sep 2021 23:06:31 +0000 (16:06 -0700)]
Merge tag 'vfio-v5.15-rc4' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - Fix vfio-ap leak on uninit (Jason Gunthorpe)

 - Add missing prototype arg name (Colin Ian King)

* tag 'vfio-v5.15-rc4' of git://github.com/awilliam/linux-vfio:
  vfio/ap_ops: Add missed vfio_uninit_group_dev()
  vfio/pci: add missing identifier name in argument of function prototype

2 years agoMerge tag 'm68k-for-v5.15-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 28 Sep 2021 20:24:43 +0000 (13:24 -0700)]
Merge tag 'm68k-for-v5.15-tag3' of git://git./linux/kernel/git/geert/linux-m68k

Pull more m68k updates from Geert Uytterhoeven:

 - signal handling fixes

 - removal of set_fs()

[ The set_fs removal isn't strictly a fix, but it's been pending for a
  while and is very welcome. The signal handling fixes resolved an issue
  that was incorrectly attributed to the set_fs changes    - Linus ]

* tag 'm68k-for-v5.15-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Remove set_fs()
  m68k: Provide __{get,put}_kernel_nofault
  m68k: Factor the 8-byte lowlevel {get,put}_user code into helpers
  m68k: Use BUILD_BUG for passing invalid sizes to get_user/put_user
  m68k: Remove the 030 case in virt_to_phys_slow
  m68k: Document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES
  m68k: Leave stack mangling to asm wrapper of sigreturn()
  m68k: Update ->thread.esp0 before calling syscall_trace() in ret_from_signal
  m68k: Handle arrivals of multiple signals correctly

2 years agoMerge tag 'nios2_fixes_for_v5.15_part1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 28 Sep 2021 20:16:52 +0000 (13:16 -0700)]
Merge tag 'nios2_fixes_for_v5.15_part1' of git://git./linux/kernel/git/dinguyen/linux

Pull nios2 fixes from Dinh Nguyen:

 - Fix build warning for unmet dependency for EARLY_PRINTK

 - Remove unused dram_start() function

* tag 'nios2_fixes_for_v5.15_part1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
  NIOS2: setup.c: drop unused variable 'dram_start'
  NIOS2: fix kconfig unmet dependency warning for SERIAL_CORE_CONSOLE

2 years agoice: Prefer kcalloc over open coded arithmetic
Len Baker [Sun, 5 Sep 2021 06:50:20 +0000 (08:50 +0200)]
ice: Prefer kcalloc over open coded arithmetic

As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

In this case this is not actually dynamic sizes: both sides of the
multiplication are constant values. However it is best to refactor this
anyway, just to keep the open-coded math idiom out of code.

So, use the purpose specific kcalloc() function instead of the argument
size * count in the kzalloc() function.

[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: Fix macro name for IPv4 fragment flag
Jeff Guo [Fri, 16 Jul 2021 22:16:44 +0000 (15:16 -0700)]
ice: Fix macro name for IPv4 fragment flag

In IPv4 header, fragment flags indicate whether the packet needs
to be fragmented or not. The value 0x20 represents MF (More Fragment); fix
the macro name to match this.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: refactor devlink getter/fallback functions to void
Jacob Keller [Fri, 20 Aug 2021 15:09:50 +0000 (08:09 -0700)]
ice: refactor devlink getter/fallback functions to void

After commit a8f89fa27773 ("ice: do not abort devlink info if board
identifier can't be found"), the getter/fallback() functions no longer
report an error. Convert the interface to a void so that it is no
longer possible to add a version field that is fatal. This makes
sense, because we should not fail to report other versions just
because one of the version pieces could not be found.

Finally, clean up the getter functions line wrapping so that none of
them take more than 80 columns, as is the usual style for networking
files.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: Fix link mode handling
Anirudh Venkataramanan [Fri, 16 Jul 2021 22:16:39 +0000 (15:16 -0700)]
ice: Fix link mode handling

The messaging for unsupported module detection is different for
lenient mode and strict mode. Update the code to print the right
messaging for a given link mode.

Media topology conflict is not an error in lenient mode, so return
an error code only if not in lenient mode.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: Add feature bitmap, helpers and a check for DSCP
Anirudh Venkataramanan [Fri, 16 Jul 2021 22:16:41 +0000 (15:16 -0700)]
ice: Add feature bitmap, helpers and a check for DSCP

DSCP a.k.a L3 QoS is only supported on certain devices. To enforce this,
this patch introduces a bitmap of features and helper functions.

The feature bitmap is set based on device IDs on driver init. Currently,
DSCP is the only feature in this bitmap, but there will be more in the
future. In the DCB netlink flow, check if the feature bit is set before
exercising DSCP.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: Add DSCP support
Dave Ertman [Fri, 6 Aug 2021 20:53:56 +0000 (13:53 -0700)]
ice: Add DSCP support

Implement code to handle submission of APP TLV's
containing DSCP to TC mapping.

The first such mapping received on an interface
will cause that PF to switch to L3 DSCP QoS mode,
apply the default config for that mode, and apply
the received mapping.

Only one such mapping will be allowed per DSCP value,
and when the last DSCP mapping is deleted, the PF
will switch back into L2 VLAN QoS mode, applying the
appropriate default QoS settings.

L3 DSCP QoS mode will only be allowed in SW DCBx
mode, in other words, when the FW LLDP engine is
disabled.  Commands that break this mutual exclusivity
will be blocked.

Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoMerge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Linus Torvalds [Tue, 28 Sep 2021 14:53:53 +0000 (07:53 -0700)]
Merge tag 'fsverity-for-linus' of git://git./fs/fscrypt/fscrypt

Pull fsverity fix from Eric Biggers:
 "Fix an integer overflow when computing the Merkle tree layout of
  extremely large files, exposed by btrfs adding support for fs-verity"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fs-verity: fix signed integer overflow with i_size near S64_MAX

2 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Tue, 28 Sep 2021 14:27:29 +0000 (07:27 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio/vdpa fixes from Michael Tsirkin:
 "Fixes up some issues in rc1"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa: potential uninitialized return in vhost_vdpa_va_map()
  vdpa/mlx5: Avoid executing set_vq_ready() if device is reset
  vdpa/mlx5: Clear ready indication for control VQ
  vduse: Cleanup the old kernel states after reset failure
  vduse: missing error code in vduse_init()
  virtio: don't fail on !of_device_is_compatible

2 years agoMerge tag 'mmc-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Tue, 28 Sep 2021 14:24:47 +0000 (07:24 -0700)]
Merge tag 'mmc-v5.15-2' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:

 - renesas_sdhi: Fix regression with hard reset on old SDHIs

 - dw_mmc: Only inject fault before done/error

* tag 'mmc-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: renesas_sdhi: fix regression with hard reset on old SDHIs
  mmc: dw_mmc: Only inject fault before done/error

2 years agogve: DQO: avoid unused variable warnings
Arnd Bergmann [Tue, 28 Sep 2021 14:15:13 +0000 (16:15 +0200)]
gve: DQO: avoid unused variable warnings

The use of dma_unmap_addr()/dma_unmap_len() in the driver causes
multiple warnings when these macros are defined as empty, e.g.
in an ARCH=i386 allmodconfig build:

drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'gve_tx_add_skb_no_copy_dqo':
drivers/net/ethernet/google/gve/gve_tx_dqo.c:494:40: error: unused variable 'buf' [-Werror=unused-variable]
  494 |                 struct gve_tx_dma_buf *buf =

This is not how the NEED_DMA_MAP_STATE macros are meant to work,
as they rely on never using local variables or a temporary structure
like gve_tx_dma_buf.

Remote the gve_tx_dma_buf definition and open-code the contents
in all places to avoid the warning. This causes some rather long
lines but otherwise ends up making the driver slightly smaller.

Fixes: a57e5de476be ("gve: DQO: Add TX path")
Link: https://lore.kernel.org/netdev/20210723231957.1113800-1-bcf@google.com/
Link: https://lore.kernel.org/netdev/20210721151100.2042139-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-pf: Use hardware register for CQE count
Geetha sowjanya [Tue, 28 Sep 2021 05:55:26 +0000 (11:25 +0530)]
octeontx2-pf: Use hardware register for CQE count

Current driver uses software CQ head pointer to poll on CQE
header in memory to determine if CQE is valid. Software needs
to make sure, that the reads of the CQE do not get re-ordered
so much that it ends up with an inconsistent view of the CQE.
To ensure that DMB barrier after read to first CQE cacheline
and before reading of the rest of the CQE is needed.
But having barrier for every CQE read will impact the performance,
instead use hardware CQ head and tail pointers to find the
valid number of CQEs.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Tue, 28 Sep 2021 12:52:46 +0000 (13:52 +0100)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2021-09-28

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 14 day(s) which contain
a total of 11 files changed, 139 insertions(+), 53 deletions(-).

The main changes are:

1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk.

2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh.

3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann.

4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi.

5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer.

6) Fix return value handling for struct_ops BPF programs, from Hou Tao.

7) Various fixes to BPF selftests, from Jiri Benc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
,

2 years agoMerge branch 'octeontx2-af-external-ptp-clock'
David S. Miller [Tue, 28 Sep 2021 12:50:38 +0000 (13:50 +0100)]
Merge branch 'octeontx2-af-external-ptp-clock'

Hariprasad Kelam says:

====================
Externel ptp clock support

Externel ptp support is required in a scenario like connecting
a external timing device to the chip for time synchronization.
This series of patches adds support to ptp driver to use external
clock and enables PTP config in CN10K MAC block (RPM). Currently
PTP configuration is left unchanged in FLR handler these patches
addresses the same.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
,

2 years agoocteontx2-af: Add external ptp input clock
Yi Guo [Tue, 28 Sep 2021 11:31:01 +0000 (17:01 +0530)]
octeontx2-af: Add external ptp input clock

PTP hardware block can be configured to utilize
the external clock. Also the current ptp timestamp
can be captured when external trigger is applied on
a gpio pin. These features are required in scenarios
like connecting a external timing device to the chip
for time synchronization. The timing device provides
the clock and trigger(PPS signal) to the PTP block.
This patch does the following:
1. configures PTP block to use external clock
frequency and timestamp capture on external event.
2. sends PTP_REQ_EXTTS events to kernel ptp phc susbsytem
with captured timestamps
3. aligns PPS edge to adjusted ptp clock in the ptp device
by setting the PPS_THRESH to the reminder of the last
timestamp value captured by external PPS

Signed-off-by: Yi Guo <yig@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>