review.tizen.org Git - platform/kernel/linux-rpi.git/log

net/sched: act_pedit: Add extack message for offload failure

For better error reporting to user space, add an extack message when
pedit action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than set/add
commands.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_mpls: Add extack messages for offload failure

For better error reporting to user space, add extack messages when mpls
action offload fails.

Example:

# echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

# tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action mpls dec_ttl
Error: cls_matchall: Failed to setup flow action.
We have an error talking to the kernel

# cat /sys/kernel/tracing/trace_pipe
tc-182 [000] b..1. 18.693915: netlink_extack: msg=act_mpls: Offload not supported when "dec_ttl" option is used
tc-182 [000] ..... 18.693921: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_mirred: Add extack message for offload failure

For better error reporting to user space, add an extack message when
mirred action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than ingress/egress
mirror/redirect.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_gact: Add extack messages for offload failure

For better error reporting to user space, add extack messages when gact
action offload fails.

Example:

# echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

# tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action continue
Error: cls_matchall: Failed to setup flow action.
We have an error talking to the kernel

# cat /sys/kernel/tracing/trace_pipe
       tc-181     [002] b..1.   105.493450: netlink_extack: msg=act_gact: Offload of "continue" action is not supported
       tc-181     [002] .....   105.493466: netlink_extack: msg=cls_matchall: Failed to setup flow action

# tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action reclassify
Error: cls_matchall: Failed to setup flow action.
We have an error talking to the kernel

# cat /sys/kernel/tracing/trace_pipe
       tc-183     [002] b..1.   124.126477: netlink_extack: msg=act_gact: Offload of "reclassify" action is not supported
       tc-183     [002] .....   124.126489: netlink_extack: msg=cls_matchall: Failed to setup flow action

# tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action pipe action drop
Error: cls_matchall: Failed to setup flow action.
We have an error talking to the kernel

# cat /sys/kernel/tracing/trace_pipe
       tc-185     [002] b..1.   137.097791: netlink_extack: msg=act_gact: Offload of "pipe" action is not supported
       tc-185     [002] .....   137.097804: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_api: Add extack to offload_act_setup() callback

The callback is used by various actions to populate the flow action
structure prior to offload. Pass extack to this callback so that the
various actions will be able to report accurate error messages to user
space.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: flower: Take verbose flag into account when logging error messages

The verbose flag was added in commit 81c7288b170a ("sched: cls: enable
verbose logging") to avoid suppressing logging of error messages that
occur "when the rule is not to be exclusively executed by the hardware".

However, such error messages are currently suppressed when setup of flow
action fails. Take the verbose flag into account to avoid suppressing
error messages. This is done by using the extack pointer initialized by
tc_cls_common_offload_init(), which performs the necessary checks.

Before:

# tc filter add dev dummy0 ingress pref 1 proto ip flower dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
# tc filter add dev dummy0 ingress pref 2 proto ip flower verbose dst_ip 198.51.100.1 action police rate 100Mbit burst 10000

After:

# tc filter add dev dummy0 ingress pref 1 proto ip flower dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
# tc filter add dev dummy0 ingress pref 2 proto ip flower verbose dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
Warning: cls_flower: Failed to setup flow action.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: matchall: Take verbose flag into account when logging error messages

The verbose flag was added in commit 81c7288b170a ("sched: cls: enable
verbose logging") to avoid suppressing logging of error messages that
occur "when the rule is not to be exclusively executed by the hardware".

However, such error messages are currently suppressed when setup of flow
action fails. Take the verbose flag into account to avoid suppressing
error messages. This is done by using the extack pointer initialized by
tc_cls_common_offload_init(), which performs the necessary checks.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-04-07

Alexander Lobakin says:

This hunts down several places around packet templates/dummies for
switch rules which are either repetitive, fragile or just not
really readable code.
It's a common need to add new packet templates and to review such
changes as well, try to simplify both with the help of a pair
macros and aliases.
ice_find_dummy_packet() became very complex at this point with tons
of nested if-elses. It clearly showed this approach does not scale,
so convert its logics to the simple mask-match + static const array.

bloat-o-meter is happy about that (built w/ LLVM 13):

add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056)
Function                                     old     new   delta
ice_fill_adv_dummy_packet                    289     291      +2
ice_adv_add_update_vsi_list                  201       -    -201
ice_add_adv_rule                            2950    2093    -857
Total: Before=414512, After=413456, chg -0.25%
add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672)
RO Data                                      old     new   delta
ice_dummy_pkt_profiles                         -     672    +672
Total: Before=37895, After=38567, chg +1.77%

Diffstat also looks nice, and adding new packet templates now takes
less lines.

We'll probably come out with dynamic template crafting in a while,
but for now let's improve what we have currently.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'aspeed-mdio-c45'

Potin Lai says:

====================
mdio: aspeed: Add Clause 45 support for Aspeed MDIO

This patch series add Clause 45 support for Aspeed MDIO driver, and
separate c22 and c45 implementation into different functions.

LINK: [v1] https://lore.kernel.org/all/20220329161949.19762-1-potin.lai@quantatw.com/
LINK: [v2] https://lore.kernel.org/all/20220406170055.28516-1-potin.lai@quantatw.com/

Changes v2 --> v3:
- sort local variable sequence in reverse Christmas tree format.

Changes v1 --> v2:
- add C45 to probe_capabilities
- break one patch into 3 small patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: aspeed: Add c45 support

Add Clause 45 support for Aspeed mdio driver.

Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: aspeed: Introduce read write function for c22 and c45

Add following additional functions to move out the implementation from
aspeed_mdio_read() and aspeed_mdio_write().

c22:
- aspeed_mdio_read_c22()
- aspeed_mdio_write_c22()

c45:
- aspeed_mdio_read_c45()
- aspeed_mdio_write_c45()

Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: aspeed: move reg accessing part into separate functions

Add aspeed_mdio_op() and aseed_mdio_get_data() for register accessing.

aspeed_mdio_op() handles operations, write command to control register,
then check and wait operations is finished (bit 31 is cleared).

aseed_mdio_get_data() fetchs the result value of operation from data
register.

Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atm: remove the ambassador driver

The driver for ATM Ambassador devices spews build warnings on
microblaze. The virt_to_bus() calls discard the volatile keyword.
The right thing to do would be to migrate this driver to a modern
DMA API but it seems unlikely anyone is actually using it.
There had been no fixes or functional changes here since
the git era begun.

In fact it sounds like the FW loading was broken from 2008
'til 2012 - see commit fcdc90b025e6 ("atm: forever loop loading
ambassador firmware").

Let's remove this driver, there isn't much changing in the APIs,
if users come forward we can apologize and revert.

Link: https://lore.kernel.org/all/20220321144013.440d7fc0@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnxt-xdp-multi-buffer'

Michael Chan says:

====================
bnxt: Support XDP multi buffer

This series adds XDP multi buffer support, allowing MTU to go beyond
the page size limit.

v4: Rebase with latest net-next
v3: Simplify page mode buffer size calculation
Check to make sure XDP program supports multipage packets
v2: Fix uninitialized variable warnings in patch 1 and 10.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: XDP multibuffer enablement

Allow aggregation buffers to be in place in the receive path and
allow XDP programs to be attached when using a larger than 4k MTU.

v3: Add a check to sure XDP program supports multipage packets.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: support transmit and free of aggregation buffers

This patch adds the following features:
- Support for XDP_TX and XDP_DROP action when using xdp_buff
with frags
- Support for freeing all frags attached to an xdp_buff
- Cleanup of TX ring buffers after transmits complete
- Slight change in definition of bnxt_sw_tx_bd since nr_frags
and RX producer may both need to be used
- Clear out skb_shared_info at the end of the buffer

v2: Fix uninitialized variable warning in bnxt_xdp_buff_frags_free().

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff

Since we have an xdp_buff with frags there needs to be a way to
convert that into a valid sk_buff in the event that XDP_PASS is
the resulting operation. This adds a new rx_skb_func when the
netdev has an MTU that prevents the packets from sitting in a
single page.

This also make sure that GRO/LRO stay disabled even when using
the aggregation ring for large buffers.

v3: Use BNXT_PAGE_MODE_BUF_SIZE for build_skb

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: add page_pool support for aggregation ring when using xdp

If we are using aggregation rings with XDP enabled, allocate page
buffers for the aggregation rings from the page_pool.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: change receive ring space parameters

Modify ring header data split and jumbo parameters to account
for the fact that the design for XDP multibuffer puts close to
the first 4k of data in a page and the remaining portions of
the packet go in the aggregation ring.

v3: Simplified code around initial buffer size calculation

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: set xdp_buff pfmemalloc flag if needed

Set the pfmemaloc flag in the xdp buff so that this can be
copied to the skb if needed for an XDP_PASS action.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: adding bnxt_rx_agg_pages_xdp for aggregated xdp

This patch adds a new function that will read pages from the
aggregation ring and create an xdp_buff with frags based on
the entries in the aggregation ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: rename bnxt_rx_pages to bnxt_rx_agg_pages_skb

Clarify that this is reading buffers from the aggregation ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: refactor bnxt_rx_pages operate on skb_shared_info

Rather than operating on an sk_buff, add frags from the aggregation
ring into the frags of an skb_shared_info. This will allow the
caller to use either an sk_buff or xdp_buff.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: add flag to denote that an xdp program is currently attached

This will be used to determine if bnxt_rx_xdp should be called
rather than calling it every time.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff

Move initialization of xdp_buff outside of bnxt_rx_xdp to prepare
for allowing bnxt_rx_xdp to operate on multibuffer xdp_buffs.

v2: Fix uninitalized variables warning in bnxt_xdp.c.
v3: Add new define BNXT_PAGE_MODE_BUF_SIZE

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tls-rx-refactor-part-1'

Jakub Kicinski says:

====================
tls: rx: random refactoring part 1

TLS Rx refactoring. Part 1 of 3. A couple of features to follow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tls: hw: rx: use return value of tls_device_decrypted() to carry status

Instead of tls_device poking into internals of the message
return 1 from tls_device_decrypted() if the device handled
the decryption.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: refactor decrypt_skb_update()

Use early return and a jump label to remove two indentation levels.
No functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: don't issue wake ups when data is decrypted

We inform the applications that data is available when
the record is received. Decryption happens inline inside
recvmsg or splice call. Generating another wakeup inside
the decryption handler seems pointless as someone must
be actively reading the socket if we are executing this
code.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: replace 'back' with 'offset'

The padding length TLS 1.3 logic is searching for content_type from
the end of text. IMHO the code is easier to parse if we calculate
offset and decrement it rather than try to maintain positive offset
from the end of the record called "back".

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: use a define for tag length

TLS 1.3 has to strip padding, and it starts out 16 bytes
from the end of the record. Make it clear this is because
of the auth tag.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: init decrypted status in tls_read_size()

We set the record type in tls_read_size(), can as well init
the tlm->decrypted field there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: don't store the decryption status in socket context

Similar justification to previous change, the information
about decryption status belongs in the skb.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: don't store the record type in socket context

Original TLS implementation was handling one record at a time.
It stashed the type of the record inside tls context (per socket
structure) for convenience. When async crypto support was added
[1] the author had to use skb->cb to store the type per-message.

The use of skb->cb overlaps with strparser, however, so a hybrid
approach was taken where type is stored in context while parsing
(since we parse a message at a time) but once parsed its copied
to skb->cb.

Recently a workaround for sockmaps [2] exposed the previously
private struct _strp_msg and started a trend of adding user
fields directly in strparser's header. This is cleaner than
storing information about an skb in the context.

This change is not strictly necessary, but IMHO the ownership
of the context field is confusing. Information naturally
belongs to the skb.

[1] commit 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
[2] commit b2c4618162ec ("bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: drop pointless else after goto

Pointless else branch after goto makes the code harder to refactor
down the line.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: jump to a more appropriate label

'recv_end:' checks num_async and decrypted, and is then followed
by the 'end' label. Since we know that decrypted and num_async
are 0 at the start we can jump to 'end'.

Move the init of decrypted and num_async to let the compiler
catch if I'm wrong.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-5.18-rc2' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.

  Current release - new code bugs:

   - mctp: correct mctp_i2c_header_create result

   - eth: fungible: fix reference to __udivdi3 on 32b builds

   - eth: micrel: remove latencies support lan8814

  Previous releases - regressions:

   - bpf: resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT

   - vrf: fix packet sniffing for traffic originating from ip tunnels

   - rxrpc: fix a race in rxrpc_exit_net()

   - dsa: revert "net: dsa: stop updating master MTU from master.c"

   - eth: ice: fix MAC address setting

  Previous releases - always broken:

   - tls: fix slab-out-of-bounds bug in decrypt_internal

   - bpf: support dual-stack sockets in bpf_tcp_check_syncookie

   - xdp: fix coalescing for page_pool fragment recycling

   - ovs: fix leak of nested actions

   - eth: sfc:
      - add missing xdp queue reinitialization
      - fix using uninitialized xdp tx_queue

   - eth: ice:
      - clear default forwarding VSI during VSI release
      - fix broken IFF_ALLMULTI handling
      - synchronize_rcu() when terminating rings

   - eth: qede: confirm skb is allocated before using

   - eth: aqc111: fix out-of-bounds accesses in RX fixup

   - eth: slip: fix NPD bug in sl_tx_timeout()"

* tag 'net-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
  drivers: net: slip: fix NPD bug in sl_tx_timeout()
  bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
  bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
  myri10ge: fix an incorrect free for skb in myri10ge_sw_tso
  net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
  qede: confirm skb is allocated before using
  net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n
  net: phy: mscc-miim: reject clause 45 register accesses
  net: axiemac: use a phandle to reference pcs_phy
  dt-bindings: net: add pcs-handle attribute
  net: axienet: factor out phy_node in struct axienet_local
  net: axienet: setup mdio unconditionally
  net: sfc: fix using uninitialized xdp tx_queue
  rxrpc: fix a race in rxrpc_exit_net()
  net: openvswitch: fix leak of nested actions
  net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()
  net: openvswitch: don't send internal clone attribute to the userspace.
  net: micrel: Fix KS8851 Kconfig
  ice: clear cmd_type_offset_bsz for TX rings
  ice: xsk: fix VSI state check in ice_xsk_wakeup()
  ...

net: mpls: fix memdup.cocci warning

Simply use kmemdup instead of explicitly allocating and copying memory.

Generated by: scripts/coccinelle/api/memdup.cocci

Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Link: https://lore.kernel.org/r/20220406114629.182833-1-gongruiqi1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hv_netvsc: Print value of invalid ID in netvsc_send_{completion,tx_complete}()

That being useful for debugging purposes.

Notice that the packet descriptor is in "private" guest memory, so
that Hyper-V can not tamper with it.

While at it, remove two unnecessary u64-casts.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: remove an unneed NULL check on list iterator

The define for_each_pci_dev(d) is:
while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)

Thus, the list iterator 'd' is always non-NULL so it doesn't need to
be checked. So just remove the unnecessary NULL check. Also remove the
unnecessary initializer because the list iterator is always initialized.

Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Link: https://lore.kernel.org/r/20220406015921.29267-1-xiam0nd.tong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: Stop using iommu_present()

Even if an IOMMU might be present for some PCI segment in the system,
that doesn't necessarily mean it provides translation for the device
we care about. It appears that what we care about here is specifically
whether DMA mapping ops involve any IOMMU overhead or not, so check for
translation actually being active for our device.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/7350f957944ecfce6cce90f422e3992a1f428775.1649166055.git.robin.murphy@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: set default assignment identifier to NET_NAME_ENUM

As noted in the original commit 685343fc3ba6 ("net: add
name_assign_type netdev attribute")

  ... when the kernel has given the interface a name using global
  device enumeration based on order of discovery (ethX, wlanY, etc)
  ... are labelled NET_NAME_ENUM.

That describes this case, so set the default for the devices here to
NET_NAME_ENUM.  Current popular network setup tools like systemd use
this only to warn if you're setting static settings on interfaces that
might change, so it is expected this only leads to better user
information, but not changing of interfaces, etc.

Signed-off-by: Ian Wienand <iwienand@redhat.com>
Link: https://lore.kernel.org/r/20220406093635.1601506-1-iwienand@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Add tracepoint for tcp_set_ca_state

The congestion status of a tcp flow may be updated since there
is congestion between tcp sender and receiver. It makes sense to
add tracepoint for congestion status set function to summate cc
status duration and evaluate the performance of network
and congestion algorithm. the backgound of this patch is below.

Link: https://github.com/iovisor/bcc/pull/3899
Signed-off-by: Ping Gan <jacky_gam_2001@163.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406010956.19656-1-jacky_gam_2001@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-core: rx_otherhost_dropped to core_stats

Increment rx_otherhost_dropped counter when packet dropped due to
mismatched dest MAC addr.

An example when this drop can occur is when manually crafting raw
packets that will be consumed by a user space application via a tap
device. For testing purposes local traffic was generated using trafgen
for the client and netcat to start a server

Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other
with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the
counter was incremented. (Also had to modify iproute2 to show the stat,
additional patch for that coming next.)

Signed-off-by: Jeffrey Ji <jeffreyji@google.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-create-a-net-core-internal-header'

Jakub Kicinski says:

====================
net: create a net/core/ internal header

We are adding stuff to netdevice.h which really should be
local to net/core/. Create a net/core/dev.h header and use it.
Minor cleanups precede.
====================

Link: https://lore.kernel.org/r/20220406213754.731066-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: extract a few internals from netdevice.h

There's a number of functions and static variables used
under net/core/ but not from the outside. We currently
dump most of them into netdevice.h. That bad for many
reasons:
- netdevice.h is very cluttered, hard to figure out
what the APIs are;
- netdevice.h is very long;
- we have to touch netdevice.h more which causes expensive
incremental builds.

Create a header under net/core/ and move some declarations.

The new header is also a bit of a catch-all but that's
fine, if we create more specific headers people will
likely over-think where their declaration fit best.
And end up putting them in netdevice.h, again.

More work should be done on splitting netdevice.h into more
targeted headers, but that'd be more time consuming so small
steps.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: unexport a handful of dev_* functions

We have a bunch of functions which are only used under
net/core/ yet they get exported. Remove the exports.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hyperv: remove use of bpf_op_t

Following patch will hide that typedef. There seems to be
no strong reason for hyperv to use it, so let's not.

Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'hyperv-fixes-signed-20220407' of git://git./linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

- Correctly propagate coherence information for VMbus devices (Michael
   Kelley)

- Disable balloon and memory hot-add on ARM64 temporarily (Boqun Feng)

- Use barrier to prevent reording when reading ring buffer (Michael
   Kelley)

- Use virt_store_mb in favour of smp_store_mb (Andrea Parri)

- Fix VMbus device object initialization (Andrea Parri)

- Deactivate sysctl_record_panic_msg on isolated guest (Andrea Parri)

- Fix a crash when unloading VMbus module (Guilherme G. Piccoli)

* tag 'hyperv-fixes-signed-20220407' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
  Drivers: hv: balloon: Disable balloon and hot-add accordingly
  Drivers: hv: balloon: Support status report for larger page sizes
  Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
  PCI: hv: Propagate coherence from VMbus device to PCI device
  Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
  Drivers: hv: vmbus: Fix potential crash on module unload
  Drivers: hv: vmbus: Fix initialization of device object in vmbus_device_register()
  Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests

Merge tag 'random-5.18-rc2-for-linus' of git://git./linux/kernel/git/crng/random

Pull random number generator fixes from Jason Donenfeld:

- Another fixup to the fast_init/crng_init split, this time in how much
   entropy is being credited, from Jan Varho.

- As discussed, we now opportunistically call try_to_generate_entropy()
   in /dev/urandom reads, as a replacement for the reverted commit. I
   opted to not do the more invasive wait_for_random_bytes() change at
   least for now, preferring to do something smaller and more obvious
   for the time being, but maybe that can be revisited as things evolve
   later.

- Userspace can use FUSE or userfaultfd or simply move a process to
   idle priority in order to make a read from the random device never
   complete, which breaks forward secrecy, fixed by overwriting
   sensitive bytes early on in the function.

- Jann Horn noticed that /dev/urandom reads were only checking for
   pending signals if need_resched() was true, a bug going back to the
   genesis commit, now fixed by always checking for signal_pending() and
   calling cond_resched(). This explains various noticeable signal
   delivery delays I've seen in programs over the years that do long
   reads from /dev/urandom.

- In order to be more like other devices (e.g. /dev/zero) and to
   mitigate the impact of fixing the above bug, which has been around
   forever (users have never really needed to check the return value of
   read() for medium-sized reads and so perhaps many didn't), we now
   move signal checking to the bottom part of the loop, and do so every
   PAGE_SIZE-bytes.

* tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: check for signals every PAGE_SIZE chunk of /dev/[u]random
  random: check for signal_pending() outside of need_resched() check
  random: do not allow user to keep crng key around on stack
  random: opportunistically initialize on /dev/urandom reads
  random: do not split fast init input in add_hwgenerator_randomness()

Merge tag 'ata-5.18-rc2' of git://git./linux/kernel/git/dlemoal/libata

Pull ata fixes from Damien Le Moal:

- Fix a compilation warning due to an uninitialized variable in
   ata_sff_lost_interrupt(), from me.

- Fix invalid internal command tag handling in the sata_dwc_460ex
   driver, from Christian.

- Disable READ LOG DMA EXT with Samsung 840 EVO SSDs as this command
   causes the drives to hang, from Christian.

- Change the config option CONFIG_SATA_LPM_POLICY back to its original
   name CONFIG_SATA_LPM_MOBILE_POLICY to avoid potential problems with
   users losing their configuration (as discussed during the merge
   window), from Mario.

* tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back
  ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
  ata: sata_dwc_460ex: Fix crash due to OOB write
  ata: libata-sff: Fix compilation warning in ata_sff_lost_interrupt()

ice: switch: convert packet template match code to rodata

Trade text size for rodata size and replace tons of nested if-elses
to the const mask match based structs. The almost entire
ice_find_dummy_packet() now becomes just one plain while-increment
loop. The order in ice_dummy_pkt_profiles[] should be same with the
if-elses order previously, as masks become less and less strict
through the array to follow the original code flow.
Apart from removing 80 locs of 4-level if-elses, it brings a solid
text size optimization:

add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056)
Function                                     old     new   delta
ice_fill_adv_dummy_packet                    289     291      +2
ice_adv_add_update_vsi_list                  201       -    -201
ice_add_adv_rule                            2950    2093    -857
Total: Before=414512, After=413456, chg -0.25%
add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672)
RO Data                                      old     new   delta
ice_dummy_pkt_profiles                         -     672    +672
Total: Before=37895, After=38567, chg +1.77%

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: switch: use convenience macros to declare dummy pkt templates

Declarations of dummy/template packet headers and offsets can be
minified to improve readability and simplify adding new templates.
Move all the repetitive constructions into two macros and let them
do the name and type expansions.
Linewrap removal is yet another positive side effect.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: switch: use a struct to pass packet template params

ice_find_dummy_packet() contains a lot of boilerplate code and a
nice room for copy-paste mistakes.
Instead of passing 3 separate pointers back and forth to get packet
template (dummy) params, directly return a structure containing
them. Then, use a macro to compose compound literals and avoid code
duplication on return path.
Now, dummy packet type/name is needed only once to return a full
correct triple pkt-pkt_len-offsets, and those are all one-liners.
dummy_ipv4_gtpu_ipv4_packet_offsets is just moved around and renamed
(as well as dummy_ipv6_gtp_packet_offsets) with no function changes.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: switch: unobscurify bitops loop in ice_fill_adv_dummy_packet()

A loop performing header modification according to the provided mask
in ice_fill_adv_dummy_packet() is very cryptic (and error-prone).
Replace two identical cast-deferences with a variable. Replace three
struct-member-array-accesses with a variable. Invert the condition,
reduce the indentation by one -> eliminate line wraps.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: switch: add and use u16[] aliases to ice_adv_lkup_elem::{h, m}_u

ice_adv_lkup_elem fields h_u and m_u are being accessed as raw u16
arrays in several places.
To reduce cast and braces burden, add permanent array-of-u16 aliases
with the same size as the `union ice_prot_hdr` itself via anonymous
unions to the actual struct declaration, and just access them
directly.

This:
- removes the need to cast the union to u16[] and then dereference
it each time -> reduces the horizon for potential bugs;
- improves -Warray-bounds coverage -- the array size is now known
at compilation time;
- addresses cppcheck complaints.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

drivers: net: slip: fix NPD bug in sl_tx_timeout()

When a slip driver is detaching, the slip_close() will act to
cleanup necessary resources and sl->tty is set to NULL in
slip_close(). Meanwhile, the packet we transmit is blocked,
sl_tx_timeout() will be called. Although slip_close() and
sl_tx_timeout() use sl->lock to synchronize, we don`t judge
whether sl->tty equals to NULL in sl_tx_timeout() and the
null pointer dereference bug will happen.

   (Thread 1)                 |      (Thread 2)
                              | slip_close()
                              |   spin_lock_bh(&sl->lock)
                              |   ...
...                           |   sl->tty = NULL //(1)
sl_tx_timeout()               |   spin_unlock_bh(&sl->lock)
  spin_lock(&sl->lock);       |
  ...                         |   ...
  tty_chars_in_buffer(sl->tty)|
    if (tty->ops->..) //(2)   |
    ...                       |   synchronize_rcu()

We set NULL to sl->tty in position (1) and dereference sl->tty
in position (2).

This patch adds check in sl_tx_timeout(). If sl->tty equals to
NULL, sl_tx_timeout() will goto out.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20220405132206.55291-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

prestera: acl: add action hw_stats support

Currently, when user adds a tc action and the action gets offloaded,
the user expects the HW stats to be counted also. This limits the
amount of supported offloaded filters, as HW counter resources may
be quite limited. Without counter assigned, the HW is capable to
carry much more filters.

To resolve the issue above, the following types of HW stats are
offloaded and supported by the driver:

any       - current default, user does not care about the type.
delayed   - polled from HW periodically.
disabled  - no HW stats needed.
immediate - not supported.

Example:
  tc filter add dev PORT ingress proto ip flower skip_sw ip_proto 0x11 \
    action drop
  tc filter add dev PORT ingress proto ip flower skip_sw ip_proto 0x12 \
    action drop hw_stats disabled
  tc filter add dev sw1p1 ingress proto ip flower skip_sw ip_proto 0x14 \
    action drop hw_stats delayed

Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Link: https://lore.kernel.org/r/1649164814-18731-1-git-send-email-volodymyr.mytnyk@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix locking issues with loops over idev->addr_list

idev->addr_list needs to be protected by idev->lock. However, it is not
always possible to do so while iterating and performing actions on
inet6_ifaddr instances. For example, multiple functions (like
addrconf_{join,leave}_anycast) eventually call down to other functions
that acquire the idev->lock. The current code temporarily unlocked the
idev->lock during the loops, which can cause race conditions. Moving the
locks up is also not an appropriate solution as the ordering of lock
acquisition will be inconsistent with for example mc_lock.

This solution adds an additional field to inet6_ifaddr that is used
to temporarily add the instances to a temporary list while holding
idev->lock. The temporary list can then be traversed without holding
idev->lock. This change was done in two places. In addrconf_ifdown, the
list_for_each_entry_safe variant of the list loop is also no longer
necessary as there is no deletion within that specific loop.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge https://git./linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2022-04-06

We've added 8 non-merge commits during the last 8 day(s) which contain
a total of 9 files changed, 139 insertions(+), 36 deletions(-).

The main changes are:

1) rethook related fixes, from Jiri and Masami.

2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.

3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
  bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
  bpf: selftests: Test fentry tracing a struct_ops program
  bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
  rethook: Fix to use WRITE_ONCE() for rethook:: Handler
  selftests/bpf: Fix warning comparing pointer to 0
  bpf: Fix sparse warnings in kprobe_multi_resolve_syms
  bpftool: Explicit errno handling in skeletons
====================

Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

random: check for signals every PAGE_SIZE chunk of /dev/[u]random

In 1448769c9cdb ("random: check for signal_pending() outside of
need_resched() check"), Jann pointed out that we previously were only
checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
had TIF_NEED_RESCHED set, which meant in practice, super long reads to
/dev/[u]random would delay signal handling by a long time. I tried this
using the below program, and indeed I wasn't able to interrupt a
/dev/urandom read until after several megabytes had been read. The bug
he fixed has always been there, and so code that reads from /dev/urandom
without checking the return value of read() has mostly worked for a long
time, for most sizes, not just for <= 256.

Maybe it makes sense to keep that code working. The reason it was so
small prior, ignoring the fact that it didn't work anyway, was likely
because /dev/random used to block, and that could happen for pretty
large lengths of time while entropy was gathered. But now, it's just a
chacha20 call, which is extremely fast and is just operating on pure
data, without having to wait for some external event. In that sense,
/dev/[u]random is a lot more like /dev/zero.

Taking a page out of /dev/zero's read_zero() function, it always returns
at least one chunk, and then checks for signals after each chunk. Chunk
sizes there are of length PAGE_SIZE. Let's just copy the same thing for
/dev/[u]random, and check for signals and cond_resched() for every
PAGE_SIZE amount of data. This makes the behavior more consistent with
expectations, and should mitigate the impact of Jann's fix for the
age-old signal check bug.

---- test program ----

  #include <unistd.h>
  #include <signal.h>
  #include <stdio.h>
  #include <sys/random.h>

  static unsigned char x[~0U];

  static void handle(int) { }

  int main(int argc, char *argv[])
  {
    pid_t pid = getpid(), child;
    signal(SIGUSR1, handle);
    if (!(child = fork())) {
      for (;;)
        kill(pid, SIGUSR1);
    }
    pause();
    printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
    kill(child, SIGTERM);
    return 0;
  }

Cc: Jann Horn <jannh@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

bnx2x: Fix undefined behavior due to shift overflowing the constant

Fix:

  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c: In function ‘bnx2x_check_blocks_with_parity3’:
  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:4917:4: error: case label does not reduce to an integer constant
      case AEU_INPUTS_ATTN_BITS_MCP_LATCHED_SCPAD_PARITY:
      ^~~~

See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ariel Elior <aelior@marvell.com>
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220405151517.29753-4-bp@alien8.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: add accessors to read/set tp->snd_cwnd

We had various bugs over the years with code
breaking the assumption that tp->snd_cwnd is greater
than zero.

Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added
in commit 8b8a321ff72c ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.

Instead of complaining too late, we want to catch where
and when tp->snd_cwnd is set to an illegal value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets

The previous commit fixed support for dual-stack sockets in
bpf_tcp_check_syncookie. This commit adjusts the selftest to verify the
fixed functionality.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-2-maximmi@nvidia.com

bpf: Support dual-stack sockets in bpf_tcp_check_syncookie

bpf_tcp_gen_syncookie looks at the IP version in the IP header and
validates the address family of the socket. It supports IPv4 packets in
AF_INET6 dual-stack sockets.

On the other hand, bpf_tcp_check_syncookie looks only at the address
family of the socket, ignoring the real IP version in headers, and
validates only the packet size. This implementation has some drawbacks:

1. Packets are not validated properly, allowing a BPF program to trick
   bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
   socket.

2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
   up receiving a SYNACK with the cookie, but the following ACK gets
   dropped.

This patch fixes these issues by changing the checks in
bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
version from the header is taken into account, and it is validated
properly with address family.

Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com

ip6_tunnel: Remove duplicate assignments

There is a same action when the variable is initialized

Signed-off-by: Hongbin Wang <wh_bin@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

myri10ge: fix an incorrect free for skb in myri10ge_sw_tso

All remaining skbs should be released when myri10ge_xmit fails to
transmit a packet. Fix it within another skb_list_walk_safe.

Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: remove the lanmedia (lmc) driver

The driver for LAN Media WAN interfaces spews build warnings on
microblaze. The virt_to_bus() calls discard the volatile keyword.
The right thing to do would be to migrate this driver to a modern
DMA API but it seems unlikely anyone is actually using it.
There had been no fixes or functional changes here since
the git era begun.

Let's remove this driver, there isn't much changing in the APIs,
if users come forward we can apologize and revert.

Link: https://lore.kernel.org/all/20220321144013.440d7fc0@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: aqc111: Fix out-of-bounds accesses in RX fixup

aqc111_rx_fixup() contains several out-of-bounds accesses that can be
triggered by a malicious (or defective) USB device, in particular:

- The metadata array (desc_offset..desc_offset+2*pkt_count) can be out of bounds,
   causing OOB reads and (on big-endian systems) OOB endianness flips.
- A packet can overlap the metadata array, causing a later OOB
   endianness flip to corrupt data used by a cloned SKB that has already
   been handed off into the network stack.
- A packet SKB can be constructed whose tail is far beyond its end,
   causing out-of-bounds heap data to be considered part of the SKB's
   data.

Found doing variant analysis. Tested it with another driver (ax88179_178a), since
I don't have a aqc111 device to test it, but the code looks very similar.

Signed-off-by: Marcin Kozlowski <marcinguy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: remove duplicate assignment

netdev_alloc_skb() has assigned ssi->netdev to skb->dev if successed,
no need to repeat assignment.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: xilinx: use of_property_read_bool() instead of of_get_property

"little-endian" has no specific content, use more helper function
of_property_read_bool() instead of of_get_property()

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: confirm skb is allocated before using

qede_build_skb() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.
This results in a kernel panic because the skb to reserve is NULL.

Add a check in case build_skb() failed to allocate and return NULL.

The NULL return is handled correctly in callers to qede_build_skb().

Fixes: 8a8633978b842 ("qede: Add build_skb() support.")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n

net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole'

Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used.

Fixes: 4b340a5a726d ("net: ip6mr: add support for passing full packet on wrong mif")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-04-05

Maciej Fijalkowski says:

We were solving issues around AF_XDP busy poll's not-so-usual scenarios,
such as very big busy poll budgets applied to very small HW rings. This
set carries the things that were found during that work that apply to
net tree.

One thing that was fixed for all in-tree ZC drivers was missing on ice
side all the time - it's about syncing RCU before destroying XDP
resources. Next one fixes the bit that is checked in ice_xsk_wakeup and
third one avoids false setting of DD bits on Tx descriptors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()

Following the recommendation in Documentation/memory-barriers.txt for
virtual machine guests.

Fixes: 8b6a877c060ed ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>

Drivers: hv: balloon: Disable balloon and hot-add accordingly

Currently there are known potential issues for balloon and hot-add on
ARM64:

* Unballoon requests from Hyper-V should only unballoon ranges
that are guest page size aligned, otherwise guests cannot handle
because it's impossible to partially free a page. This is a
problem when guest page size > 4096 bytes.

* Memory hot-add requests from Hyper-V should provide the NUMA
node id of the added ranges or ARM64 should have a functional
memory_add_physaddr_to_nid(), otherwise the node id is missing
for add_memory().

These issues require discussions on design and implementation. In the
meanwhile, post_status() is working and essential to guest monitoring.
Therefore instead of disabling the entire hv_balloon driver, the
ballooning (when page size > 4096 bytes) and hot-add are disabled
accordingly for now. Once the issues are fixed, they can be re-enable in
these cases.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220325023212.1570049-3-boqun.feng@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>

Drivers: hv: balloon: Support status report for larger page sizes

DM_STATUS_REPORT expects the numbers of pages in the unit of 4k pages
(HV_HYP_PAGE) instead of guest pages, so to make it work when guest page
sizes are larger than 4k, convert the numbers of guest pages into the
numbers of HV_HYP_PAGEs.

Note that the numbers of guest pages are still used for tracing because
tracing is internal to the guest kernel.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220325023212.1570049-2-boqun.feng@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>

random: check for signal_pending() outside of need_resched() check

signal_pending() checks TIF_NOTIFY_SIGNAL and TIF_SIGPENDING, which
signal that the task should bail out of the syscall when possible. This
is a separate concept from need_resched(), which checks
TIF_NEED_RESCHED, signaling that the task should preempt.

In particular, with the current code, the signal_pending() bailout
probably won't work reliably.

Change this to look like other functions that read lots of data, such as
read_zero().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Merge branch 'mtk_eth_soc-flo-offload-plus-wireless'

Felix Fietkau says:

====================
MediaTek SoC flow offload improvements + wireless support

This series contains the following improvements to mediatek ethernet flow
offload support:

- support dma-coherent on ethernet to improve performance
- add ipv6 offload support
- rework hardware flow table entry handling to improve dealing with hash
  collisions and competing flows
- support creating offload entries from user space
- support creating offload entries with just source/destination mac address,
  vlan and output device information
- add driver changes for supporting the Wireless Ethernet Dispatch core,
  which can be used to offload flows from ethernet to MT7915 PCIe WLAN
  devices

Changes in v2:
- add missing dt-bindings patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: support creating mac address based offload entries

This will be used to implement a limited form of bridge offloading.
Since the hardware does not support flow table entries with just source
and destination MAC address, the driver has to emulate it.

The hardware automatically creates entries entries for incoming flows, even
when they are bridged instead of routed, and reports when packets for these
flows have reached the minimum PPS rate for offloading.

After this happens, we look up the L2 flow offload entry based on the MAC
header and fill in the output routing information in the flow table.
The dynamically created per-flow entries are automatically removed when
either the hardware flowtable entry expires, is replaced, or if the offload
rule they belong to is removed

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: remove bridge flow offload type entry support

According to MediaTek, this feature is not supported in current hardware

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: rework hardware flow table management

The hardware was designed to handle flow detection and creation of flow entries
by itself, relying on the software primarily for filling in egress routing
information.
When there is a hash collision between multiple flows, this allows the hardware
to maintain the entry for the most active flow.
Additionally, the hardware only keeps offloading active for entries with at
least 30 packets per second.

With this rework, the code no longer creates a hardware entries directly.
Instead, the hardware entry is only created when the PPE reports a matching
unbound flow with the minimum target rate.
In order to reduce CPU overhead, looking for flows belonging to a hash entry
is rate limited to once every 100ms.

This rework is also used as preparation for emulating bridge offload by
managing L4 offload entries on demand.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: allocate struct mtk_ppe separately

Preparation for adding more data to it, which will increase its size.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: support TC_SETUP_BLOCK for PPE offload

This allows offload entries to be created from user space

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: add ipv6 flow offload support

Add the missing IPv6 flow offloading support for routing only.
Hardware flow offloading is done by the packet processing engine (PPE)
of the Ethernet MAC and as it doesn't support mangling of IPv6 packets,
IPv6 NAT cannot be supported.

Signed-off-by: David Bentham <db260179@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: dts: mediatek: mt7622: introduce nodes for Wireless Ethernet Dispatch

Introduce wed0 and wed1 nodes in order to enable offloading forwarding
between ethernet and wireless devices on the mt7622 chipset.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: implement flow offloading to WED devices

This allows hardware flow offloading from Ethernet to WLAN on MT7622 SoC

Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)

The Wireless Ethernet Dispatch subsystem on the MT7622 SoC can be
configured to intercept and handle access to the DMA queues and
PCIe interrupts for a MT7615/MT7915 wireless card.
It can manage the internal WDMA (Wireless DMA) controller, which allows
ethernet packets to be passed from the packet switch engine (PSE) to the
wireless card, bypassing the CPU entirely.
This can be used to implement hardware flow offloading from ethernet to
WLAN.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: arm: mediatek: document the pcie mirror node on MT7622

This patch adds the pcie mirror document bindings for MT7622 SoC.
The feature is used for intercepting PCIe MMIO access for the WED core
Add related info in mediatek-net bindings.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: arm: mediatek: document WED binding for MT7622

Document the binding for the Wireless Ethernet Dispatch core on the MT7622
SoC, which is used for Ethernet->WLAN offloading
Add related info in mediatek-net bindings.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: dts: mediatek: mt7622: add support for coherent DMA

It improves performance by eliminating the need for a cache flush on rx and tx

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: add support for coherent DMA

It improves performance by eliminating the need for a cache flush on rx and tx
In preparation for supporting WED (Wireless Ethernet Dispatch), also add a
function for disabling coherent DMA at runtime.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: mediatek: add optional properties for the SoC ethernet core

Introduce dma-coherent, cci-control and hifsys optional properties to
the mediatek ethernet controller bindings

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

random: do not allow user to keep crng key around on stack

The fast key erasure RNG design relies on the key that's used to be used
and then discarded. We do this, making judicious use of
memzero_explicit(). However, reads to /dev/urandom and calls to
getrandom() involve a copy_to_user(), and userspace can use FUSE or
userfaultfd, or make a massive call, dynamically remap memory addresses
as it goes, and set the process priority to idle, in order to keep a
kernel stack alive indefinitely. By probing
/proc/sys/kernel/random/entropy_avail to learn when the crng key is
refreshed, a malicious userspace could mount this attack every 5 minutes
thereafter, breaking the crng's forward secrecy.

In order to fix this, we just overwrite the stack's key with the first
32 bytes of the "free" fast key erasure output. If we're returning <= 32
bytes to the user, then we can still return those bytes directly, so
that short reads don't become slower. And for long reads, the difference
is hopefully lost in the amortization, so it doesn't change much, with
that amortization helping variously for medium reads.

We don't need to do this for get_random_bytes() and the various
kernel-space callers, and later, if we ever switch to always batching,
this won't be necessary either, so there's no need to change the API of
these functions.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jann Horn <jannh@google.com>
Fixes: c92e040d575a ("random: add backtracking protection to the CRNG")
Fixes: 186873c549df ("random: use simpler fast key erasure flow on per-cpu keys")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Merge branch 'mscc-miim'

Michael Walle says:

====================
net: phy: mscc-miim: add MDIO bus frequency support

Introduce MDIO bus frequency support. This way the board can have a
faster (or maybe slower) bus frequency than the hardware default.

changes since v2:
- resend, no RFC anymore, because net-next is open again
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc-miim: add support to set MDIO bus frequency

Until now, the MDIO bus will have the hardware default bus frequency.
Read the desired frequency of the bus from the device tree and configure
it.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: mscc-miim: add clock and clock-frequency

Add the (optional) clock input of the MDIO controller and indicate that
the common clock-frequency property is supported. The driver can use it
to set the desired MDIO bus frequency.

Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: convert mscc-miim to YAML format

Convert the mscc-miim device tree binding to the new YAML format.

The original binding don't mention if the interrupt property is optional
or not. But on the SparX-5 SoC, for example, the interrupt property isn't
used, thus in the new binding that property is optional. FWIW the driver
doesn't use interrupts at all.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc-miim: reject clause 45 register accesses

The driver doesn't support clause 45 register access yet, but doesn't
check if the access is a c45 one either. This leads to spurious register
reads and writes. Add the check.

Fixes: 542671fe4d86 ("net: phy: mscc-miim: Add MDIO driver")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>