platform/kernel/linux-rpi.git
2 years agoocteon_ep: Fix spelling mistake "inerrupts" -> "interrupts"
Colin Ian King [Thu, 14 Apr 2022 08:08:34 +0000 (09:08 +0100)]
octeon_ep: Fix spelling mistake "inerrupts" -> "interrupts"

There is a spelling mistake in a dev_info message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mlxsw-line-card-prep'
David S. Miller [Fri, 15 Apr 2022 10:06:13 +0000 (11:06 +0100)]
Merge branch 'mlxsw-line-card-prep'

Ido Schimmel says:

====================
mlxsw: Preparations for line cards support

Currently, mlxsw registers thermal zones as well as hwmon entries for
objects such as transceiver modules and gearboxes. In upcoming modular
systems, these objects are no longer found on the main board (i.e., slot
0), but on plug-able line cards. This patchset prepares mlxsw for such
systems in terms of hwmon, thermal and cable access support.

Patches #1-#3 gradually prepare mlxsw for transceiver modules access
support for line cards by splitting some of the internal structures and
some APIs.

Patches #4-#5 gradually prepare mlxsw for hwmon support for line cards
by splitting some of the internal structures and augmenting them with a
slot index.

Patches #6-#7 do the same for thermal zones.

Patch #8 selects cooling device for binding to a thermal zone by exact
name match to prevent binding to non-relevant devices.

Patch #9 replaces internal define for thermal zone name length with a
common define.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core_thermal: Use common define for thermal zone name length
Vadim Pasternak [Wed, 13 Apr 2022 15:17:33 +0000 (18:17 +0300)]
mlxsw: core_thermal: Use common define for thermal zone name length

Replace internal define 'MLXSW_THERMAL_ZONE_MAX_NAME' by common
'THERMAL_NAME_LENGTH'.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core_thermal: Use exact name of cooling devices for binding
Vadim Pasternak [Wed, 13 Apr 2022 15:17:32 +0000 (18:17 +0300)]
mlxsw: core_thermal: Use exact name of cooling devices for binding

Modular system supports additional cooling devices "mlxreg_fan1",
"mlxreg_fan2", etcetera. Thermal zones in "mlxsw" driver should be
bound to the same device as before called "mlxreg_fan". Used exact
match for cooling device name to avoid binding to new additional
cooling devices.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core_thermal: Add line card id prefix to line card thermal zone name
Vadim Pasternak [Wed, 13 Apr 2022 15:17:31 +0000 (18:17 +0300)]
mlxsw: core_thermal: Add line card id prefix to line card thermal zone name

Add prefix "lc#n" to thermal zones associated with the thermal objects
found on line cards.

For example thermal zone for module #9 located at line card #7 will
have type:
mlxsw-lc7-module9.
And thermal zone for gearbox #3 located at line card #5 will have type:
mlxsw-lc5-gearbox3.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core_thermal: Extend internal structures to support multi thermal areas
Vadim Pasternak [Wed, 13 Apr 2022 15:17:30 +0000 (18:17 +0300)]
mlxsw: core_thermal: Extend internal structures to support multi thermal areas

Introduce intermediate level for thermal zones areas.
Currently all thermal zones are associated with thermal objects located
within the main board. Such objects are created during driver
initialization and removed during driver de-initialization.

For line cards in modular system the thermal zones are to be associated
with the specific line card. They should be created whenever new line
card is available (inserted, validated, powered and enabled) and
removed, when line card is getting unavailable.
The thermal objects found on the line card #n are accessed by setting
slot index to #n, while for access to objects found on the main board
slot index should be set to default value zero.

Each thermal area contains the set of thermal zones associated with
particular slot index.
Thus introduction of thermal zone areas allows to use the same APIs for
the main board and line cards, by adding slot index argument.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core_hwmon: Introduce slot parameter in hwmon interfaces
Vadim Pasternak [Wed, 13 Apr 2022 15:17:29 +0000 (18:17 +0300)]
mlxsw: core_hwmon: Introduce slot parameter in hwmon interfaces

Add 'slot' parameter to 'mlxsw_hwmon_dev' structure. Use this parameter
in mlxsw_reg_mtmp_pack(), mlxsw_reg_mtbr_pack(), mlxsw_reg_mgpir_pack()
and mlxsw_reg_mtmp_slot_index_set() routines.
For main board it'll always be zero, for line cards it'll be set to
the physical slot number in modular systems.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core_hwmon: Extend internal structures to support multi hwmon objects
Vadim Pasternak [Wed, 13 Apr 2022 15:17:28 +0000 (18:17 +0300)]
mlxsw: core_hwmon: Extend internal structures to support multi hwmon objects

Currently, mlxsw supports a single hwmon device and registers it with
attributes corresponding to the various objects found on the main
board such as fans and gearboxes.

Line cards can have the same objects, but unlike the main board they
can be added and removed while the system is running. The various
hwmon objects found on these line cards should be created when the
line card becomes available and destroyed when the line card becomes
unavailable.

The above can be achieved by representing each line card as a
different hwmon device and registering / unregistering it when the
line card becomes available / unavailable.

Prepare for multi hwmon device support by splitting
'struct mlxsw_hwmon' into 'struct mlxsw_hwmon' and
'struct mlxsw_hwmon_dev'. The first will hold information relevant to
all hwmon devices, whereas the second will hold per-hwmon device
information.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core: Move port module events enablement to a separate function
Vadim Pasternak [Wed, 13 Apr 2022 15:17:27 +0000 (18:17 +0300)]
mlxsw: core: Move port module events enablement to a separate function

Use a separate function for enablement of port module events such
plug/unplug and temperature threshold crossing. The motivation is to
reuse the function for line cards.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core: Extend port module data structures for line cards
Vadim Pasternak [Wed, 13 Apr 2022 15:17:26 +0000 (18:17 +0300)]
mlxsw: core: Extend port module data structures for line cards

The port module core is tasked with module operations such as setting
power mode policy and reset. The per-module information is currently
stored in one large array suited for non-modular systems where only the
main board is present (i.e., slot index 0).

As a preparation for line cards support, allocate a per line card array
according to the queried number of slots in the system. For each line
card, allocate a module array according to the queried maximum number of
modules per-slot.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: core: Extend interfaces for cable info access with slot argument
Vadim Pasternak [Wed, 13 Apr 2022 15:17:25 +0000 (18:17 +0300)]
mlxsw: core: Extend interfaces for cable info access with slot argument

Extend all cable info APIs with 'slot_index' argument.

For main board, slot will always be set to zero and these APIs will work
as before. If reading cable information is required from cages located
on line cards, slot should be set to the physical slot number, where
line card is located in modular systems.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: cpsw_priv: using pm_runtime_resume_and_get instead of pm_runtime_g...
Minghao Chi [Wed, 13 Apr 2022 09:38:36 +0000 (09:38 +0000)]
net: ethernet: ti: cpsw_priv: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: stmmac: stmmac_main: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Minghao Chi [Wed, 13 Apr 2022 09:38:01 +0000 (09:38 +0000)]
net: stmmac: stmmac_main: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: cpsw_new: use pm_runtime_resume_and_get() instead of pm_runtime_ge...
Minghao Chi [Wed, 13 Apr 2022 09:35:29 +0000 (09:35 +0000)]
net: ethernet: ti: cpsw_new: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogeneve: avoid indirect calls in GRO path, when possible
Paolo Abeni [Wed, 13 Apr 2022 08:44:40 +0000 (10:44 +0200)]
geneve: avoid indirect calls in GRO path, when possible

In the most common setups, the geneve tunnels use an inner
ethernet encapsulation. In the GRO path, when such condition is
true, we can call directly the relevant GRO helper and avoid
a few indirect calls.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mneta-page_pool_get_stats'
David S. Miller [Fri, 15 Apr 2022 09:43:48 +0000 (10:43 +0100)]
Merge branch 'mneta-page_pool_get_stats'

Lorenzo Bianconi says:

====================
net: mvneta: add support for page_pool_get_stats

Introduce page_pool stats ethtool APIs in order to avoid driver duplicated
code.

Changes since v4:
- rebase on top of net-next

Changes since v3:
- get rid of wrong for loop in page_pool_ethtool_stats_get()
- add API stubs when page_pool_stats are not compiled in

Changes since v2:
- remove enum list of page_pool stats in page_pool.h
- remove leftover change in mvneta.c for ethtool_stats array allocation

Changes since v1:
- move stats accounting to page_pool code
- move stats string management to page_pool code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mvneta: add support for page_pool_get_stats
Lorenzo Bianconi [Tue, 12 Apr 2022 16:31:59 +0000 (18:31 +0200)]
net: mvneta: add support for page_pool_get_stats

Introduce support for the page_pool stats API into mvneta driver.
Report page_pool stats through ethtool.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: page_pool: introduce ethtool stats
Lorenzo Bianconi [Tue, 12 Apr 2022 16:31:58 +0000 (18:31 +0200)]
net: page_pool: introduce ethtool stats

Introduce page_pool APIs to report stats through ethtool and reduce
duplicated code in each driver.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Paolo Abeni [Fri, 15 Apr 2022 07:26:00 +0000 (09:26 +0200)]
Merge git://git./linux/kernel/git/netdev/net

2 years agoMerge tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 14 Apr 2022 18:58:19 +0000 (11:58 -0700)]
Merge tag 'net-5.18-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from wireless and netfilter.

  Current release - regressions:

   - smc: fix af_ops of child socket pointing to released memory

   - wifi: ath9k: fix usage of driver-private space in tx_info

  Previous releases - regressions:

   - ipv6: fix panic when forwarding a pkt with no in6 dev

   - sctp: use the correct skb for security_sctp_assoc_request

   - smc: fix NULL pointer dereference in smc_pnet_find_ib()

   - sched: fix initialization order when updating chain 0 head

   - phy: don't defer probe forever if PHY IRQ provider is missing

   - dsa: revert "net: dsa: setup master before ports"

   - dsa: felix: fix tagging protocol changes with multiple CPU ports

   - eth: ice:
      - fix use-after-free when freeing @rx_cpu_rmap
      - revert "iavf: fix deadlock occurrence during resetting VF
        interface"

   - eth: lan966x: stop processing the MAC entry is port is wrong

  Previous releases - always broken:

   - sched:
      - flower: fix parsing of ethertype following VLAN header
      - taprio: check if socket flags are valid

   - nfc: add flush_workqueue to prevent uaf

   - veth: ensure eth header is in skb's linear part

   - eth: stmmac: fix altr_tse_pcs function when using a fixed-link

   - eth: macb: restart tx only if queue pointer is lagging

   - eth: macvlan: fix leaking skb in source mode with nodst option"

* tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
  rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
  net: dsa: felix: fix tagging protocol changes with multiple CPU ports
  tun: annotate access to queue->trans_start
  nfc: nci: add flush_workqueue to prevent uaf
  net: dsa: realtek: don't parse compatible string for RTL8366S
  net: dsa: realtek: fix Kconfig to assure consistent driver linkage
  net: ftgmac100: access hardware register after clock ready
  Revert "net: dsa: setup master before ports"
  macvlan: Fix leaking skb in source mode with nodst option
  netfilter: nf_tables: nft_parse_register can return a negative value
  net: lan966x: Stop processing the MAC entry is port is wrong.
  net: lan966x: Fix when a port's upper is changed.
  net: lan966x: Fix IGMP snooping when frames have vlan tag
  net: lan966x: Update lan966x_ptp_get_nominal_value
  sctp: Initialize daddr on peeled off socket
  net/smc: Fix af_ops of child socket pointing to released memory
  net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
  net/smc: use memcpy instead of snprintf to avoid out of bounds read
  net: macb: Restart tx only if queue pointer is lagging
  ...

2 years agoMerge tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 14 Apr 2022 18:08:12 +0000 (11:08 -0700)]
Merge tag 'sound-5.18-rc3' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "This became an unexpectedly large pull request due to various
  regression fixes in the previous kernels.

  The majority of fixes are a series of patches to address the
  regression at probe errors in devres'ed drivers, while there are yet
  more fixes for the x86 SG allocations and for USB-audio buffer
  management. In addition, a few HD-audio quirks and other small fixes
  are found"

* tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits)
  ALSA: usb-audio: Limit max buffer and period sizes per time
  ALSA: memalloc: Add fallback SG-buffer allocations for x86
  ALSA: nm256: Don't call card private_free at probe error path
  ALSA: mtpav: Don't call card private_free at probe error path
  ALSA: rme9652: Fix the missing snd_card_free() call at probe error
  ALSA: hdspm: Fix the missing snd_card_free() call at probe error
  ALSA: hdsp: Fix the missing snd_card_free() call at probe error
  ALSA: oxygen: Fix the missing snd_card_free() call at probe error
  ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
  ALSA: cmipci: Fix the missing snd_card_free() call at probe error
  ALSA: aw2: Fix the missing snd_card_free() call at probe error
  ALSA: als300: Fix the missing snd_card_free() call at probe error
  ALSA: lola: Fix the missing snd_card_free() call at probe error
  ALSA: bt87x: Fix the missing snd_card_free() call at probe error
  ALSA: sis7019: Fix the missing error handling
  ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
  ALSA: via82xx: Fix the missing snd_card_free() call at probe error
  ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
  ALSA: rme96: Fix the missing snd_card_free() call at probe error
  ALSA: rme32: Fix the missing snd_card_free() call at probe error
  ...

2 years agoMerge tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 14 Apr 2022 17:58:27 +0000 (10:58 -0700)]
Merge tag 'for-5.18-rc2-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more code and warning fixes.

  There's one feature ioctl removal patch slated for 5.18 that did not
  make it to the main pull request. It's just a one-liner and the ioctl
  has a v2 that's in use for a long time, no point to postpone it to
  5.19.

  Late update:

   - remove balance v1 ioctl, superseded by v2 in 2012

  Fixes:

   - add back cgroup attribution for compressed writes

   - add super block write start/end annotations to asynchronous balance

   - fix root reference count on an error handling path

   - in zoned mode, activate zone at the chunk allocation time to avoid
     ENOSPC due to timing issues

   - fix delayed allocation accounting for direct IO

  Warning fixes:

   - simplify assertion condition in zoned check

   - remove an unused variable"

* tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix btrfs_submit_compressed_write cgroup attribution
  btrfs: fix root ref counts in error handling in btrfs_get_root_ref
  btrfs: zoned: activate block group only for extent allocation
  btrfs: return allocated block group from do_chunk_alloc()
  btrfs: mark resumed async balance as writing
  btrfs: remove support of balance v1 ioctl
  btrfs: release correct delalloc amount in direct IO write path
  btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
  btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range

2 years agoMerge tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 14 Apr 2022 17:51:20 +0000 (10:51 -0700)]
Merge tag 'fscache-fixes-20220413' of git://git./linux/kernel/git/dhowells/linux-fs

Pull fscache fixes from David Howells:
 "Here's a collection of fscache and cachefiles fixes and misc small
  cleanups. The two main fixes are:

   - Add a missing unmark of the inode in-use mark in an error path.

   - Fix a KASAN slab-out-of-bounds error when setting the xattr on a
     cachefiles volume due to the wrong length being given to memcpy().

  In addition, there's the removal of an unused parameter, removal of an
  unused Kconfig option, conditionalising a bit of procfs-related stuff
  and some doc fixes"

* tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  fscache: remove FSCACHE_OLD_API Kconfig option
  fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
  fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
  fscache: Remove the cookie parameter from fscache_clear_page_bits()
  docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
  docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
  cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
  cachefiles: unmark inode in use in error path

2 years agoMerge branch 'rndis_host-handle-bogus-mac-addresses-in-zte-rndis-devices'
Paolo Abeni [Thu, 14 Apr 2022 13:08:16 +0000 (15:08 +0200)]
Merge branch 'rndis_host-handle-bogus-mac-addresses-in-zte-rndis-devices'

Lech Perczak says:

====================
rndis_host: handle bogus MAC addresses in ZTE RNDIS devices

When porting support of ZTE MF286R to OpenWrt [1], it was discovered,
that its built-in LTE modem fails to adjust its target MAC address,
when a random MAC address is assigned to the interface, due to detection of
"locally-administered address" bit. This leads to dropping of ingress
trafficat the host. The modem uses RNDIS as its primary interface,
with some variants exposing both of them simultaneously.

Then it was discovered, that cdc_ether driver contains a fixup for that
exact issue, also appearing on CDC ECM interfaces.
I discussed how to proceed with that with Bjørn Mork at OpenWrt forum [3],
with the first approach would be to trust the locally-administered MAC
again, and add a quirk for the problematic ZTE devices, as suggested by
Kristian Evensen. before [4], but reusing the fixup from cdc_ether looks
like a safer and more generic solution.

Finally, according to Bjørn's suggestion. limit the scope of bogus MAC
addressdetection to ZTE devices, the same way as it is done in cdc_ether,
as this trait wasn't really observed outside of ZTE devices.
Do that for both flavours of RNDIS devices, with interface classes
02/02/ff and e0/01/03, as both types are reported by different modems.

[1] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=7ac8da00609f42b8aba74b7efc6b0d055b7cef3e
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfe9b9d2df669a57a95d641ed46eb018e204c6ce
[3] https://forum.openwrt.org/t/problem-with-modem-in-zte-mf286r/120988
[4] https://lore.kernel.org/all/CAKfDRXhDp3heiD75Lat7cr1JmY-kaJ-MS0tt7QXX=s8RFjbpUQ@mail.gmail.com/T/

Cc: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
v3: Fixed wrong identifier commit description and whitespace in patch 2.

v2: ensure that MAC fixup is applied to all Ethernet frames in RNDIS
batch, by introducing a driver flag, and integrating the fixup inside
rndis_rx_fixup().
====================

Link: https://lore.kernel.org/r/20220413014416.2306843-1-lech.perczak@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agorndis_host: limit scope of bogus MAC address detection to ZTE devices
Lech Perczak [Wed, 13 Apr 2022 01:44:16 +0000 (03:44 +0200)]
rndis_host: limit scope of bogus MAC address detection to ZTE devices

Reporting of bogus MAC addresses and ignoring configuration of new
destination address wasn't observed outside of a range of ZTE devices,
among which this seems to be the common bug. Align rndis_host driver
with implementation found in cdc_ether, which also limits this workaround
to ZTE devices.

Suggested-by: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agorndis_host: enable the bogus MAC fixup for ZTE devices from cdc_ether
Lech Perczak [Wed, 13 Apr 2022 01:44:15 +0000 (03:44 +0200)]
rndis_host: enable the bogus MAC fixup for ZTE devices from cdc_ether

Certain ZTE modems, namely: MF823. MF831, MF910, built-in modem from
MF286R, expose both CDC-ECM and RNDIS network interfaces.
They have a trait of ignoring the locally-administered MAC address
configured on the interface both in CDC-ECM and RNDIS part,
and this leads to dropping of incoming traffic by the host.
However, the workaround was only present in CDC-ECM, and MF286R
explicitly requires it in RNDIS mode.

Re-use the workaround in rndis_host as well, to fix operation of MF286R
module, some versions of which expose only the RNDIS interface. Do so by
introducing new flag, RNDIS_DRIVER_DATA_DST_MAC_FIXUP, and testing for it
in rndis_rx_fixup. This is required, as RNDIS uses frame batching, and all
of the packets inside the batch need the fixup. This might introduce a
performance penalty, because test is done for every returned Ethernet
frame.

Apply the workaround to both "flavors" of RNDIS interfaces, as older ZTE
modems, like MF823 found in the wild, report the USB_CLASS_COMM class
interfaces, while MF286R reports USB_CLASS_WIRELESS_CONTROLLER.

Suggested-by: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocdc_ether: export usbnet_cdc_zte_rx_fixup
Lech Perczak [Wed, 13 Apr 2022 01:44:14 +0000 (03:44 +0200)]
cdc_ether: export usbnet_cdc_zte_rx_fixup

Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduces a workaround for certain ZTE modems reporting invalid MAC
addresses over CDC-ECM.
The same issue was present on their RNDIS interface,which was fixed in
commit a5a18bdf7453 ("rndis_host: Set valid random MAC on buggy devices").

However, internal modem of ZTE MF286R router, on its RNDIS interface, also
exhibits a second issue fixed already in CDC-ECM, of the device not
respecting configured random MAC address. In order to share the fixup for
this with rndis_host driver, export the workaround function, which will
be re-used in the following commit in rndis_host.

Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Bjørn Mork <bjorn@mork.no>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
Jeremy Linton [Tue, 12 Apr 2022 21:04:20 +0000 (16:04 -0500)]
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"

It turns out after digging deeper into this bug, that it was being
triggered by GCC12 failing to call the bcmgenet_enable_dma()
routine. Given that a gcc12 fix has been merged [1] and the genet
driver now works properly when built with gcc12, this commit should
be reverted.

[1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aabb9a261ef060cf24fd626713f1d7d9df81aa57

Fixes: 8d3ea3d402db ("net: bcmgenet: Use stronger register read/writes to assure ordering")
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220412210420.1129430-1-jeremy.linton@arm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agortnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
Petr Machata [Tue, 12 Apr 2022 20:25:06 +0000 (22:25 +0200)]
rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies

When L3 stats are disabled, rtnl_offload_xstats_get_size_stats() returns
size of 0, which is supposed to be an indication that the corresponding
attribute should not be emitted. However, instead, the current code
reserves a 0-byte attribute.

The reason this does not show up as a citation on a kasan kernel is that
netdev_offload_xstats_get(), which is supposed to fill in the data, never
ends up getting called, because rtnl_offload_xstats_get_stats() notices
that the stats are not actually used and skips the call.

Thus a zero-length IFLA_OFFLOAD_XSTATS_L3_STATS attribute ends up in a
response, confusing the userspace.

Fix by skipping the L3-stats related block in rtnl_offload_xstats_fill().

Fixes: 0e7788fd7622 ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/591b58e7623edc3eb66dd1fcfa8c8f133d090974.1649794741.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: dsa: felix: fix tagging protocol changes with multiple CPU ports
Vladimir Oltean [Tue, 12 Apr 2022 17:22:09 +0000 (20:22 +0300)]
net: dsa: felix: fix tagging protocol changes with multiple CPU ports

When the device tree has 2 CPU ports defined, a single one is active
(has any dp->cpu_dp pointers point to it). Yet the second one is still a
CPU port, and DSA still calls ->change_tag_protocol on it.

On the NXP LS1028A, the CPU ports are ports 4 and 5. Port 4 is the
active CPU port and port 5 is inactive.

After the following commands:

 # Initial setting
 cat /sys/class/net/eno2/dsa/tagging
 ocelot
 echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
 echo ocelot > /sys/class/net/eno2/dsa/tagging

traffic is now broken, because the driver has moved the NPI port from
port 4 to port 5, unbeknown to DSA.

The problem can be avoided by detecting that the second CPU port is
unused, and not doing anything for it. Further rework will be needed
when proper support for multiple CPU ports is added.

Treat this as a bug and prepare current kernels to work in single-CPU
mode with multiple-CPU DT blobs.

Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220412172209.2531865-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agotun: annotate access to queue->trans_start
Antoine Tenart [Tue, 12 Apr 2022 13:58:52 +0000 (15:58 +0200)]
tun: annotate access to queue->trans_start

Commit 5337824f4dc4 ("net: annotate accesses to queue->trans_start")
introduced a new helper, txq_trans_cond_update, to update
queue->trans_start using WRITE_ONCE. One snippet in drivers/net/tun.c
was missed, as it was introduced roughly at the same time.

Fixes: 5337824f4dc4 ("net: annotate accesses to queue->trans_start")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220412135852.466386-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonfc: nci: add flush_workqueue to prevent uaf
Lin Ma [Tue, 12 Apr 2022 16:04:30 +0000 (00:04 +0800)]
nfc: nci: add flush_workqueue to prevent uaf

Our detector found a concurrent use-after-free bug when detaching an
NCI device. The main reason for this bug is the unexpected scheduling
between the used delayed mechanism (timer and workqueue).

The race can be demonstrated below:

Thread-1                           Thread-2
                                 | nci_dev_up()
                                 |   nci_open_device()
                                 |     __nci_request(nci_reset_req)
                                 |       nci_send_cmd
                                 |         queue_work(cmd_work)
nci_unregister_device()          |
  nci_close_device()             | ...
    del_timer_sync(cmd_timer)[1] |
...                              | Worker
nci_free_device()                | nci_cmd_work()
  kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]

In short, the cleanup routine thought that the cmd_timer has already
been detached by [1] but the mod_timer can re-attach the timer [2], even
it is already released [3], resulting in UAF.

This UAF is easy to trigger, crash trace by POC is like below

[   66.703713] ==================================================================
[   66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
[   66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
[   66.703974]
[   66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
[   66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
[   66.703974] Call Trace:
[   66.703974]  <TASK>
[   66.703974]  dump_stack_lvl+0x57/0x7d
[   66.703974]  print_report.cold+0x5e/0x5db
[   66.703974]  ? enqueue_timer+0x448/0x490
[   66.703974]  kasan_report+0xbe/0x1c0
[   66.703974]  ? enqueue_timer+0x448/0x490
[   66.703974]  enqueue_timer+0x448/0x490
[   66.703974]  __mod_timer+0x5e6/0xb80
[   66.703974]  ? mark_held_locks+0x9e/0xe0
[   66.703974]  ? try_to_del_timer_sync+0xf0/0xf0
[   66.703974]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
[   66.703974]  ? queue_work_on+0x61/0x80
[   66.703974]  ? lockdep_hardirqs_on+0xbf/0x130
[   66.703974]  process_one_work+0x8bb/0x1510
[   66.703974]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   66.703974]  ? pwq_dec_nr_in_flight+0x230/0x230
[   66.703974]  ? rwlock_bug.part.0+0x90/0x90
[   66.703974]  ? _raw_spin_lock_irq+0x41/0x50
[   66.703974]  worker_thread+0x575/0x1190
[   66.703974]  ? process_one_work+0x1510/0x1510
[   66.703974]  kthread+0x2a0/0x340
[   66.703974]  ? kthread_complete_and_exit+0x20/0x20
[   66.703974]  ret_from_fork+0x22/0x30
[   66.703974]  </TASK>
[   66.703974]
[   66.703974] Allocated by task 267:
[   66.703974]  kasan_save_stack+0x1e/0x40
[   66.703974]  __kasan_kmalloc+0x81/0xa0
[   66.703974]  nci_allocate_device+0xd3/0x390
[   66.703974]  nfcmrvl_nci_register_dev+0x183/0x2c0
[   66.703974]  nfcmrvl_nci_uart_open+0xf2/0x1dd
[   66.703974]  nci_uart_tty_ioctl+0x2c3/0x4a0
[   66.703974]  tty_ioctl+0x764/0x1310
[   66.703974]  __x64_sys_ioctl+0x122/0x190
[   66.703974]  do_syscall_64+0x3b/0x90
[   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   66.703974]
[   66.703974] Freed by task 406:
[   66.703974]  kasan_save_stack+0x1e/0x40
[   66.703974]  kasan_set_track+0x21/0x30
[   66.703974]  kasan_set_free_info+0x20/0x30
[   66.703974]  __kasan_slab_free+0x108/0x170
[   66.703974]  kfree+0xb0/0x330
[   66.703974]  nfcmrvl_nci_unregister_dev+0x90/0xd0
[   66.703974]  nci_uart_tty_close+0xdf/0x180
[   66.703974]  tty_ldisc_kill+0x73/0x110
[   66.703974]  tty_ldisc_hangup+0x281/0x5b0
[   66.703974]  __tty_hangup.part.0+0x431/0x890
[   66.703974]  tty_release+0x3a8/0xc80
[   66.703974]  __fput+0x1f0/0x8c0
[   66.703974]  task_work_run+0xc9/0x170
[   66.703974]  exit_to_user_mode_prepare+0x194/0x1a0
[   66.703974]  syscall_exit_to_user_mode+0x19/0x50
[   66.703974]  do_syscall_64+0x48/0x90
[   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae

To fix the UAF, this patch adds flush_workqueue() to ensure the
nci_cmd_work is finished before the following del_timer_sync.
This combination will promise the timer is actually detached.

Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: realtek: don't parse compatible string for RTL8366S
Alvin Šipraga [Tue, 12 Apr 2022 15:57:49 +0000 (17:57 +0200)]
net: dsa: realtek: don't parse compatible string for RTL8366S

This switch is not even supported, but if someone were to actually put
this compatible string "realtek,rtl8366s" in their device tree, they
would be greeted with a kernel panic because the probe function would
dereference NULL. So let's just remove it.

Link: https://lore.kernel.org/all/CACRpkdYdKZs0WExXc3=0yPNOwP+oOV60HRz7SRoGjZvYHaT=1g@mail.gmail.com/
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: realtek: fix Kconfig to assure consistent driver linkage
Alvin Šipraga [Tue, 12 Apr 2022 15:55:27 +0000 (17:55 +0200)]
net: dsa: realtek: fix Kconfig to assure consistent driver linkage

The kernel test robot reported a build failure:

or1k-linux-ld: drivers/net/dsa/realtek/realtek-smi.o:(.rodata+0x16c): undefined reference to `rtl8366rb_variant'

... with the following build configuration:

CONFIG_NET_DSA_REALTEK=y
CONFIG_NET_DSA_REALTEK_SMI=y
CONFIG_NET_DSA_REALTEK_RTL8365MB=y
CONFIG_NET_DSA_REALTEK_RTL8366RB=m

The problem here is that the realtek-smi interface driver gets built-in,
while the rtl8366rb switch subdriver gets built as a module, hence the
symbol rtl8366rb_variant is not reachable when defining the OF device
table in the interface driver.

The Kconfig dependencies don't help in this scenario because they just
say that the subdriver(s) depend on at least one interface driver. In
fact, the subdrivers don't depend on the interface drivers at all, and
can even be built even in their absence. Somewhat strangely, the
interface drivers can also be built in the absence of any subdriver,
BUT, if a subdriver IS enabled, then it must be reachable according to
the linkage of the interface driver: effectively what the IS_REACHABLE()
macro achieves. If it is not reachable, the above kind of linker error
will be observed.

Rather than papering over the above build error by simply using
IS_REACHABLE(), we can do a little better and admit that it is actually
the interface drivers that have a dependency on the subdrivers. So this
patch does exactly that. Specifically, we ensure that:

1. The interface drivers' Kconfig symbols must have a value no greater
   than the value of any subdriver Kconfig symbols.

2. The subdrivers should by default enable both interface drivers, since
   most users probably want at least one of them; those interface
   drivers can be explicitly disabled however.

What this doesn't do is prevent a user from building only a subdriver,
without any interface driver. To that end, add an additional line of
help in the menu to guide users in the right direction.

Link: https://lore.kernel.org/all/202204110757.XIafvVnj-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes: aac94001067d ("net: dsa: realtek: add new mdio interface for drivers")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: update nfp_X logging definitions
Dylan Muller [Tue, 12 Apr 2022 15:26:00 +0000 (17:26 +0200)]
nfp: update nfp_X logging definitions

Previously it was not possible to determine which code path was responsible
for generating a certain message after a call to the nfp_X messaging
definitions for cases of duplicate strings. We therefore modify nfp_err,
nfp_warn, nfp_info, nfp_dbg and nfp_printk to print the corresponding file
and line number where the nfp_X definition is used.

Signed-off-by: Dylan Muller <dylan.muller@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'wireless-2022-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Wed, 13 Apr 2022 12:13:34 +0000 (13:13 +0100)]
Merge tag 'wireless-2022-04-13' of git://git./linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v5.18

First set of fixes for v5.18. Maintainers file updates, two
compilation warning fixes, one revert for ath11k and smaller fixes to
drivers and stack. All the usual stuff.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ip-ingress-skb-reason'
David S. Miller [Wed, 13 Apr 2022 12:09:57 +0000 (13:09 +0100)]
Merge branch 'ip-ingress-skb-reason'

Menglong Dong says:

====================
net: ip: add skb drop reasons to ip ingress

In the series "net: use kfree_skb_reason() for ip/udp packet receive",
skb drop reasons are added to the basic ingress path of IPv4. And in
the series "net: use kfree_skb_reason() for ip/neighbour", the egress
paths of IPv4 and IPv6 are handled. Related links:

https://lore.kernel.org/netdev/20220205074739.543606-1-imagedong@tencent.com/
https://lore.kernel.org/netdev/20220226041831.2058437-1-imagedong@tencent.com/

Seems we still have a lot work to do with IP layer, including IPv6 basic
ingress path, IPv4/IPv6 forwarding, IPv6 exthdrs, fragment and defrag,
etc.

In this series, skb drop reasons are added to the basic ingress path of
IPv6 protocol and IPv4/IPv6 packet forwarding. Following functions, which
are used for IPv6 packet receiving are handled:

  ip6_pkt_drop()
  ip6_rcv_core()
  ip6_protocol_deliver_rcu()

And following functions that used for IPv6 TLV parse are handled:

  ip6_parse_tlv()
  ipv6_hop_ra()
  ipv6_hop_ioam()
  ipv6_hop_jumbo()
  ipv6_hop_calipso()
  ipv6_dest_hao()

Besides, ip_forward() and ip6_forward(), which are used for IPv4/IPv6
forwarding, are also handled. And following new drop reasons are added:

  /* host unreachable, corresponding to IPSTATS_MIB_INADDRERRORS */
  SKB_DROP_REASON_IP_INADDRERRORS
  /* network unreachable, corresponding to IPSTATS_MIB_INADDRERRORS */
  SKB_DROP_REASON_IP_INNOROUTES
  /* packet size is too big, corresponding to
   * IPSTATS_MIB_INTOOBIGERRORS
   */
  SKB_DROP_REASON_PKT_TOO_BIG

In order to simply the definition and assignment for
'enum skb_drop_reason', some helper functions are introduced in the 1th
patch. I'm not such if this is necessary, but it makes the code simpler.
For example, we can replace the code:

  if (reason == SKB_DROP_REASON_NOT_SPECIFIED)
          reason = SKB_DROP_REASON_IP_INHDR;

with:

  SKB_DR_OR(reason, IP_INHDR);

In the 6th patch, the statistics for skb in ipv6_hop_jum() is removed,
as I think it is redundant. There are two call chains for
ipv6_hop_jumbo(). The first one is:

  ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()

On this call chain, the drop statistics will be done in
ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
returns false.

The second call chain is:

  ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()

And the drop statistics will also be done in ip6_rcv_core() with
'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.

Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
means the drop is counted twice. The statistics in ipv6_hop_jumbo()
is almost the same as the outside, except the
'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.

======================================================================

Here is a basic test for IPv6 forwarding packet drop that monitored by
'dropwatch' tool:

  drop at: ip6_forward+0x81a/0xb70 (0xffffffff86c73f8a)
  origin: software
  input port ifindex: 7
  timestamp: Wed Apr 13 11:51:06 2022 130010176 nsec
  protocol: 0x86dd
  length: 94
  original length: 94
  drop reason: IP_INADDRERRORS

The origin cause of this case is that IPv6 doesn't allow to forward the
packet with LOCAL-LINK saddr, and results the 'IP_INADDRERRORS' drop
reason.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu()
Menglong Dong [Wed, 13 Apr 2022 08:16:00 +0000 (16:16 +0800)]
net: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu()

Replace kfree_skb() used in ip6_protocol_deliver_rcu() with
kfree_skb_reason().

No new reasons are added.

Some paths are ignored, as they are not common, such as encapsulation
on non-final protocol.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipv6: add skb drop reasons to ip6_rcv_core()
Menglong Dong [Wed, 13 Apr 2022 08:15:59 +0000 (16:15 +0800)]
net: ipv6: add skb drop reasons to ip6_rcv_core()

Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason().
No new drop reasons are added.

Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during
ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS'
do. Will it be too general and hard to know what happened?

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipv6: add skb drop reasons to TLV parse
Menglong Dong [Wed, 13 Apr 2022 08:15:58 +0000 (16:15 +0800)]
net: ipv6: add skb drop reasons to TLV parse

Replace kfree_skb() used in TLV encoded option header parsing with
kfree_skb_reason(). Following functions are involved:

ip6_parse_tlv()
ipv6_hop_ra()
ipv6_hop_ioam()
ipv6_hop_jumbo()
ipv6_hop_calipso()
ipv6_dest_hao()

Most skb drops during this process are regarded as 'InHdrErrors',
as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails,
which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly.

However, 'IP_INHDR' is a relatively general reason. Therefore, we
can use other reasons with higher priority in some cases. For example,
'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipv6: remove redundant statistics in ipv6_hop_jumbo()
Menglong Dong [Wed, 13 Apr 2022 08:15:57 +0000 (16:15 +0800)]
net: ipv6: remove redundant statistics in ipv6_hop_jumbo()

There are two call chains for ipv6_hop_jumbo(). The first one is:

ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()

On this call chain, the drop statistics will be done in
ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
returns false.

The second call chain is:

ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()

And the drop statistics will also be done in ip6_rcv_core() with
'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.

Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
means the drop is counted twice. The statistics in ipv6_hop_jumbo()
is almost the same as the outside, except the
'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: icmp: introduce function icmpv6_param_prob_reason()
Menglong Dong [Wed, 13 Apr 2022 08:15:56 +0000 (16:15 +0800)]
net: icmp: introduce function icmpv6_param_prob_reason()

In order to add the skb drop reasons support to icmpv6_param_prob(),
introduce the function icmpv6_param_prob_reason() and make
icmpv6_param_prob() an inline call to it. This new function will be
used in the following patches.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ip: add skb drop reasons to ip forwarding
Menglong Dong [Wed, 13 Apr 2022 08:15:55 +0000 (16:15 +0800)]
net: ip: add skb drop reasons to ip forwarding

Replace kfree_skb() which is used in ip6_forward() and ip_forward()
with kfree_skb_reason().

The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for
the case that the length of the packet exceeds MTU and can't
fragment.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipv6: add skb drop reasons to ip6_pkt_drop()
Menglong Dong [Wed, 13 Apr 2022 08:15:54 +0000 (16:15 +0800)]
net: ipv6: add skb drop reasons to ip6_pkt_drop()

Replace kfree_skb() used in ip6_pkt_drop() with kfree_skb_reason().
No new reason is added.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipv4: add skb drop reasons to ip_error()
Menglong Dong [Wed, 13 Apr 2022 08:15:53 +0000 (16:15 +0800)]
net: ipv4: add skb drop reasons to ip_error()

Eventually, I find out the handler function for inputting route lookup
fail: ip_error().

The drop reasons we used in ip_error() are almost corresponding to
IPSTATS_MIB_*, and following new reasons are introduced:

SKB_DROP_REASON_IP_INADDRERRORS
SKB_DROP_REASON_IP_INNOROUTES

Isn't the name SKB_DROP_REASON_IP_HOSTUNREACH and
SKB_DROP_REASON_IP_NETUNREACH more accurate? To make them corresponding
to IPSTATS_MIB_*, we keep their name still.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoskb: add some helpers for skb drop reasons
Menglong Dong [Wed, 13 Apr 2022 08:15:52 +0000 (16:15 +0800)]
skb: add some helpers for skb drop reasons

In order to simply the definition and assignment for
'enum skb_drop_reason', introduce some helpers.

SKB_DR() is used to define a variable of type 'enum skb_drop_reason'
with the 'SKB_DROP_REASON_NOT_SPECIFIED' initial value.

SKB_DR_SET() is used to set the value of the variable. Seems it is
a little useless? But it makes the code shorter.

SKB_DR_OR() is used to set the value of the variable if it is not set
yet, which means its value is SKB_DROP_REASON_NOT_SPECIFIED.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'octeon_ep-driver'
David S. Miller [Wed, 13 Apr 2022 11:56:32 +0000 (12:56 +0100)]
Merge branch 'octeon_ep-driver'

Veerasenareddy Burru says:

====================
Add octeon_ep driver

This driver implements networking functionality of Marvell's Octeon
PCI Endpoint NIC.

This driver support following devices:
 * Network controller: Cavium, Inc. Device b200

V4 -> V5:
   - Fix warnings reported by clang.
   - Address comments from community reviews.

V3 -> V4:
   - Fix warnings and errors reported by "make W=1 C=1".

V2 -> V3:
   - Fix warnings and errors reported by kernel test robot:
     "Reported-by: kernel test robot <lkp@intel.com>"

V1 -> V2:
    - Address review comments on original patch series.
    - Divide PATCH 1/4 from the original series into 4 patches in
      v2 patch series: PATCH 1/7 to PATCH 4/7.
    - Fix clang build errors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: add ethtool support for Octeon PCI Endpoint NIC
Veerasenareddy Burru [Wed, 13 Apr 2022 03:35:03 +0000 (20:35 -0700)]
octeon_ep: add ethtool support for Octeon PCI Endpoint NIC

Add support for the following ethtool commands:

ethtool -i|--driver devname
ethtool devname
ethtool -s devname [speed N] [autoneg on|off] [advertise N]
ethtool -S|--statistics devname

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: add Tx/Rx processing and interrupt support
Veerasenareddy Burru [Wed, 13 Apr 2022 03:35:02 +0000 (20:35 -0700)]
octeon_ep: add Tx/Rx processing and interrupt support

Add support to enable MSI-x and register interrupts.
Add support to process Tx and Rx traffic. Includes processing
Tx completions and Rx refill.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: add support for ndo ops
Veerasenareddy Burru [Wed, 13 Apr 2022 03:35:01 +0000 (20:35 -0700)]
octeon_ep: add support for ndo ops

Add support for ndo ops to set MAC address, change MTU, get stats.
Add control path support to set MAC address, change MTU, get stats,
set speed, get and set link mode.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: add Tx/Rx ring resource setup and cleanup
Veerasenareddy Burru [Wed, 13 Apr 2022 03:35:00 +0000 (20:35 -0700)]
octeon_ep: add Tx/Rx ring resource setup and cleanup

Implement Tx/Rx ring resource allocation and cleanup.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: Add mailbox for control commands
Veerasenareddy Burru [Wed, 13 Apr 2022 03:34:59 +0000 (20:34 -0700)]
octeon_ep: Add mailbox for control commands

Add mailbox between host and NIC to send control commands from host to
NIC and receive responses and notifications from NIC to host driver,
like link status update.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: add hardware configuration APIs
Veerasenareddy Burru [Wed, 13 Apr 2022 03:34:58 +0000 (20:34 -0700)]
octeon_ep: add hardware configuration APIs

Implement hardware resource init and shutdown helper APIs.
This includes hardware Tx/Rx queue init/enable/disable/reset,
non queue interrupt handler that decodes non-queue interrupt type.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: Add driver framework and device initialization
Veerasenareddy Burru [Wed, 13 Apr 2022 03:34:57 +0000 (20:34 -0700)]
octeon_ep: Add driver framework and device initialization

Add driver framework and device setup and initialization for Octeon
PCI Endpoint NIC.

Add implementation to load module, initilaize, register network device,
cleanup and unload module.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'br-flush-filtering'
David S. Miller [Wed, 13 Apr 2022 11:46:26 +0000 (12:46 +0100)]
Merge branch 'br-flush-filtering'

Nikolay Aleksandrov says:

====================
net: bridge: add flush filtering support

This patch-set adds support to specify filtering conditions for a bulk
delete (flush) operation. This version uses a new nlmsghdr delete flag
called NLM_F_BULK in combination with a new ndo_fdb_del_bulk op which is
used to signal that the driver supports bulk deletes (that avoids
pushing common mac address checks to ndo_fdb_del implementations and
also has a different prototype and parsed attribute expectations, more
info in patch 03). The new delete flag can be used for any RTM_DEL*
type, implementations just need to be careful with older kernels which
are doing non-strict attribute parses. A new rtnl flag
(RTNL_FLAG_BULK_DEL_SUPPORTED) is used to show that the delete supports
NLM_F_BULK. A proper error is returned if bulk delete is not supported.
For old kernels I use the fact that mac address attribute (lladdr) is
mandatory in the classic fdb del case, but it's not allowed if bulk
deleting so older kernels will error out.

Patch 01 and 02 are minor rtnetlink cleanups to make the code easier to
read. They remove hardcoded values and use names instead. Patch 03 uses
BIT() for rtnl flags.
Patch 04 adds the new NLM_F_BULK delete request modifier, patch 05 adds
the new bulk delete flag and checks for it if the delete requests have
NLM_F_BULK set, it also warns if rtnl register is called with a non-delete
kind and the bulk delete flag is set.
Patch 06 adds the new ndo_fdb_del_bulk call. Patch 07 adds NLM_F_BULK
support to rtnl_fdb_del, on such request strict parsing is used only for
the supported attributes, and if the ndo is implemented it's called, the
NTF_SELF/MASTER rules are the same as for the standard rtnl_fdb_del.
Patch 08 implements bridge-specific minimal ndo_fdb_del_bulk call which
uses the current br_fdb_flush to delete all entries. Patch 09 adds
filtering support to the new bridge flush op which supports target
ifindex (port or bridge), vlan id and flags/state mask. Patch 10 adds
ndm state and flags mask attributes which will be used for filtering.
Patch 11 converts ndm state/flags and their masks to bridge-private flags
and fills them in the filter descriptor for matching. Finally patch 12
fills in the target ifindex (after validating it) and vlan id (already
validated by rtnl_fdb_flush) for matching. Flush filtering is needed
because user-space applications need a quick way to delete only a
specific set of entries, e.g. mlag implementations need a way to flush only
dynamic entries excluding externally learned ones or only externally
learned ones without static entries etc. Also apps usually want to target
only a specific vlan or port/vlan combination. The current 2 flush
operations (per port and bridge-wide) are not extensible and cannot
provide such filtering.

I decided against embedding new attrs into the old flush attributes for
multiple reasons - proper error handling on unsupported attributes,
older kernels silently flushing all, need for a second mechanism to
signal that the attribute should be parsed (e.g. using boolopts),
special treatment for permanent entries.

Examples:
$ bridge fdb flush dev bridge vlan 100 static
< flush all static entries on vlan 100 >
$ bridge fdb flush dev bridge vlan 1 dynamic
< flush all dynamic entries on vlan 1 >
$ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
< flush all dynamic entries on port ens16 and vlan 1 >
$ bridge fdb flush dev ens16 vlan 1 dynamic master
< as above: flush all dynamic entries on port ens16 and vlan 1 >
$ bridge fdb flush dev bridge nooffloaded nopermanent self
< flush all non-offloaded and non-permanent entries >
$ bridge fdb flush dev bridge static noextern_learn
< flush all static entries which are not externally learned >
$ bridge fdb flush dev bridge permanent
< flush all permanent entries >
$ bridge fdb flush dev bridge port bridge permanent
< flush all permanent entries pointing to the bridge itself >

Example of a flush call with unsupported netlink attribute (NDA_DST):
$ bridge fdb flush dev bridge vlan 100 dynamic dst
Error: Unsupported attribute.

Example of a flush call on an older kernel:
$ bridge fdb flush dev bridge dynamic
Error: invalid address.

Example of calling PF_UNSPEC RTM_DELNEIGH which doesn't support bulk delete
with NLM_F_BULK set (ip neigh is changed to add the flag):
$ ip n del 192.168.122.5 lladdr 00:11:22:33:44:55 dev ens3
Error: Bulk delete is not supported.

Note that all flags have their negated version (static vs nostatic etc)
and there are some tricky cases to handle like "static" which in flag
terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
mask matches on both but we need only NUD_NOARP to be set. That's
because permanent entries have both set so we can't just match on
NUD_NOARP. Also note that this flush operation doesn't treat permanent
entries in a special way (fdb_delete vs fdb_delete_local), it will
delete them regardless if any port is using them. We can extend the api
with a flag to do that if needed in the future.

Patch-sets (in order):
 - Initial bulk del infra and fdb flush filtering (this set)
 - iproute2 support
 - selftests

v4: Add and check for rtnl del bulk supported flag when using
    NLM_F_BULK (new patch 05), patches 01 - 03 are also new minor cleanups
    to remove use of raw values and make code easier to read, don't
    rename br_fdb_flush in patch 08, set port ifindex as flush target if
    NDA_IFINDEX is missing and flush was called with port netdev and
    NTF_MASTER (patch 12).

v3: Add NLM_F_BULK delete modifier and ndo_fdb_del_bulk callback,
    patches 01 - 03 and 06 are new. Patch 04 is changed to implement
    bulk_del instead of flush, patches 05, 07 and 08 are adjusted to
    use NDA_ attributes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bridge: fdb: add support for flush filtering based on ifindex and vlan
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:02 +0000 (13:52 +0300)]
net: bridge: fdb: add support for flush filtering based on ifindex and vlan

Add support for fdb flush filtering based on destination ifindex and
vlan id. The ifindex must either match a port's device ifindex or the
bridge's. The vlan support is trivial since it's already validated by
rtnl_fdb_del, we just need to fill it in.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bridge: fdb: add support for flush filtering based on ndm flags and state
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:01 +0000 (13:52 +0300)]
net: bridge: fdb: add support for flush filtering based on ndm flags and state

Add support for fdb flush filtering based on ndm flags and state. NDM
state and flags are mapped to bridge-specific flags and matched
according to the specified masks. NTF_USE is used to represent
added_by_user flag since it sets it on fdb add and we don't have a 1:1
mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are
ignored.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: rtnetlink: add ndm flags and state mask attributes
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:52:00 +0000 (13:52 +0300)]
net: rtnetlink: add ndm flags and state mask attributes

Add ndm flags/state masks which will be used for bulk delete filtering.
All of these are used by the bridge and vxlan drivers. Also minimal attr
policy validation is added, it is up to ndo_fdb_del_bulk implementers to
further validate them.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bridge: fdb: add support for fine-grained flushing
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:59 +0000 (13:51 +0300)]
net: bridge: fdb: add support for fine-grained flushing

Add the ability to specify exactly which fdbs to be flushed. They are
described by a new structure - net_bridge_fdb_flush_desc. Currently it
can match on port/bridge ifindex, vlan id and fdb flags. It is used to
describe the existing dynamic fdb flush operation. Note that this flush
operation doesn't treat permanent entries in a special way (fdb_delete vs
fdb_delete_local), it will delete them regardless if any port is using
them, so currently it can't directly replace deletes which need to handle
that case, although we can extend it later for that too.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bridge: fdb: add ndo_fdb_del_bulk
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:58 +0000 (13:51 +0300)]
net: bridge: fdb: add ndo_fdb_del_bulk

Add a minimal ndo_fdb_del_bulk implementation which flushes all entries.
Support for more fine-grained filtering will be added in the following
patches.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: rtnetlink: add NLM_F_BULK support to rtnl_fdb_del
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:57 +0000 (13:51 +0300)]
net: rtnetlink: add NLM_F_BULK support to rtnl_fdb_del

When NLM_F_BULK is specified in a fdb del message we need to handle it
differently. First since this is a new call we can strictly validate the
passed attributes, at first only ifindex and vlan are allowed as these
will be the initially supported filter attributes, any other attribute
is rejected. The mac address is no longer mandatory, but we use it
to error out in older kernels because it cannot be specified with bulk
request (the attribute is not allowed) and then we have to dispatch
the call to ndo_fdb_del_bulk if the device supports it. The del bulk
callback can do further validation of the attributes if necessary.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: add ndo_fdb_del_bulk
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:56 +0000 (13:51 +0300)]
net: add ndo_fdb_del_bulk

Add a new netdev op called ndo_fdb_del_bulk, it will be later used for
driver-specific bulk delete implementation dispatched from rtnetlink. The
first user will be the bridge, we need it to signal to rtnetlink from
the driver that we support bulk delete operation (NLM_F_BULK).

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: rtnetlink: add bulk delete support flag
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:55 +0000 (13:51 +0300)]
net: rtnetlink: add bulk delete support flag

Add a new rtnl flag (RTNL_FLAG_BULK_DEL_SUPPORTED) which is used to
verify that the delete operation allows bulk object deletion. Also emit
a warning if anyone tries to set it for non-delete kind.

Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: netlink: add NLM_F_BULK delete request modifier
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:54 +0000 (13:51 +0300)]
net: netlink: add NLM_F_BULK delete request modifier

Add a new delete request modifier called NLM_F_BULK which, when
supported, would cause the request to delete multiple objects. The flag
is a convenient way to signal that a multiple delete operation is
requested which can be gradually added to different delete requests. In
order to make sure older kernels will error out if the operation is not
supported instead of doing something unintended we have to break a
required condition when implementing support for this flag, f.e. for
neighbors we will omit the mandatory mac address attribute.
Initially it will be used to add flush with filtering support for bridge
fdbs, but it also opens the door to add similar support to others.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: rtnetlink: use BIT for flag values
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:53 +0000 (13:51 +0300)]
net: rtnetlink: use BIT for flag values

Use BIT to define flag values.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: rtnetlink: add helper to extract msg type's kind
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:52 +0000 (13:51 +0300)]
net: rtnetlink: add helper to extract msg type's kind

Add a helper which extracts the msg type's kind using the kind mask (0x3).

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: rtnetlink: add msg kind names
Nikolay Aleksandrov [Wed, 13 Apr 2022 10:51:51 +0000 (13:51 +0300)]
net: rtnetlink: add msg kind names

Add rtnl kind names instead of using raw values. We'll need to
check for DEL kind later to validate bulk flag support.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ftgmac100: access hardware register after clock ready
Dylan Hung [Tue, 12 Apr 2022 11:48:59 +0000 (19:48 +0800)]
net: ftgmac100: access hardware register after clock ready

AST2600 MAC register 0x58 is writable only when the MAC clock is
enabled.  Usually, the MAC clock is enabled by the bootloader so
register 0x58 is set normally when the bootloader is involved.  To make
ast2600 ftgmac100 work without the bootloader, postpone the register
write until the clock is ready.

Fixes: 137d23cea1c0 ("net: ftgmac100: Fix Aspeed ast2600 TX hang issue")
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'net-ti-storm-prevention-support'
David S. Miller [Wed, 13 Apr 2022 11:42:40 +0000 (12:42 +0100)]
Merge branch 'net-ti-storm-prevention-support'

Grygorii Strashko says:

====================
net: ethernet: ti: enable bc/mc storm prevention support

This series first adds supports for the ALE feature to rate limit number ingress
broadcast(BC)/multicast(MC) packets per/sec which main purpose is BC/MC storm
prevention.

And then enables corresponding support for ingress broadcast(BC)/multicast(MC)
packets rate limiting for TI CPSW switchdev and AM65x/J221E CPSW_NUSS drivers by
implementing HW offload for simple tc-flower with policer action with matches
on dst_mac/mask:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 20000 pkts_burst 1 drop

  pkts_burst - not used.

The solution inspired patch from Vladimir Oltean [1].

Changes in v3:
  - comments applied
  - policer validation added

Changes in v2:
 - switch to packet-per-second policing introduced by
   commit 2ffe0395288a ("net/sched: act_police: add support for packet-per-second policing") [2]

v2: https://patchwork.kernel.org/project/netdevbpf/cover/20211101170122.19160-1-grygorii.strashko@ti.com/
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20201114035654.32658-1-grygorii.strashko@ti.com/

[1] https://lore.kernel.org/patchwork/patch/1217254/
[2] https://patchwork.kernel.org/project/netdevbpf/cover/20210312140831.23346-1-simon.horman@netronome.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: cpsw_new: enable bc/mc storm prevention support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:29 +0000 (13:29 +0300)]
net: ethernet: ti: cpsw_new: enable bc/mc storm prevention support

This patch enables support for ingress broadcast(BC)/multicast(MC) packets
rate limiting in TI CPSW switchdev driver (the corresponding ALE support
was added in previous patch) by implementing HW offload for simple
tc-flower with policer action with matches on dst_mac:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 10000 pkts_burst 1 drop

  pkts_burst - not used.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: enable bc/mc storm prevention support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:28 +0000 (13:29 +0300)]
net: ethernet: ti: am65-cpsw: enable bc/mc storm prevention support

This patch enables support for ingress broadcast(BC)/multicast(MC) packets
rate limiting in TI AM65x CPSW driver (the corresponding ALE support was
added in previous patch) by implementing HW offload for simple tc-flower
with policer action with matches on dst_mac/mask:
 - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate
limiting (exact match)
 - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC
packets rate limiting

The CPSW supports MC/BC packets rate limiting in packets/sec and affects
all ingress MC/BC packets and serves as BC/MC storm prevention feature.

Examples:
- BC rate limit to 1000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
  action police pkts_rate 1000 pkts_burst 1 drop

- MC rate limit to 20000pps:
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \
  action police rate pkts_rate 20000 pkts_burst 1 drop

  pkts_burst - not used.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodrivers: net: cpsw: ale: add broadcast/multicast rate limit support
Grygorii Strashko [Tue, 12 Apr 2022 10:29:27 +0000 (13:29 +0300)]
drivers: net: cpsw: ale: add broadcast/multicast rate limit support

The CPSW ALE supports feature to rate limit number ingress
broadcast(BC)/multicast(MC) packets per/sec which main purpose is BC/MC
storm prevention.

The ALE BC/MC packet rate limit configuration consist of two parts:
- global
  ALE_CONTROL.ENABLE_RATE_LIMIT bit 0 which enables rate limiting globally
  ALE_PRESCALE.PRESCALE specifies rate limiting interval
- per-port
  ALE_PORTCTLx.BCASTMCAST/_LIMIT specifies number of BC/MC packets allowed
  per rate limiting interval.
  When port.BCASTMCAST/_LIMIT is 0 rate limiting is disabled for Port.

When BC/MC packet rate limiting is enabled the number of allowed packets
per/sec is defined as:
  number_of_packets/sec = (Fclk / ALE_PRESCALE) * port.BCASTMCAST/_LIMIT

Hence, the ALE_PRESCALE configuration is common for all ports the 1ms
interval is selected and configured during ALE initialization while
port.BCAST/MCAST_LIMIT are configured per-port.
This allows to achieve:
 - min number_of_packets = 1000 when port.BCAST/MCAST_LIMIT = 1
 - max number_of_packets = 1000 * 255 = 255000
   when port.BCAST/MCAST_LIMIT = 0xFF

The ALE_CONTROL.ENABLE_RATE_LIMIT can also be enabled once during ALE
initialization as rate limiting enabled by non zero port.BCASTMCAST/_LIMIT
values.

This patch implements above logic in ALE and adds new ALE APIs
 cpsw_ale_rx_ratelimit_bc();
 cpsw_ale_rx_ratelimit_mc();

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phylink: remove phylink_helper_basex_speed()
Russell King (Oracle) [Tue, 12 Apr 2022 10:24:00 +0000 (11:24 +0100)]
net: phylink: remove phylink_helper_basex_speed()

As there are now no users of phylink_helper_basex_speed(), we can
remove this obsolete functionality.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoRevert "net: dsa: setup master before ports"
Vladimir Oltean [Tue, 12 Apr 2022 09:44:26 +0000 (12:44 +0300)]
Revert "net: dsa: setup master before ports"

This reverts commit 11fd667dac315ea3f2469961f6d2869271a46cae.

dsa_slave_change_mtu() updates the MTU of the DSA master and of the
associated CPU port, but only if it detects a change to the master MTU.

The blamed commit in the Fixes: tag below addressed a regression where
dsa_slave_change_mtu() would return early and not do anything due to
ds->ops->port_change_mtu() not being implemented.

However, that commit also had the effect that the master MTU got set up
to the correct value by dsa_master_setup(), but the associated CPU port's
MTU did not get updated. This causes breakage for drivers that rely on
the ->port_change_mtu() DSA call to account for the tagging overhead on
the CPU port, and don't set up the initial MTU during the setup phase.

Things actually worked before because they were in a fragile equilibrium
where dsa_slave_change_mtu() was called before dsa_master_setup() was.
So dsa_slave_change_mtu() could actually detect a change and update the
CPU port MTU too.

Restore the code to the way things used to work by reverting the reorder
of dsa_tree_setup_master() and dsa_tree_setup_ports(). That change did
not have a concrete motivation going for it anyway, it just looked
better.

Fixes: 066dfc429040 ("Revert "net: dsa: stop updating master MTU from master.c"")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomacvlan: Fix leaking skb in source mode with nodst option
Martin Willi [Tue, 12 Apr 2022 09:34:57 +0000 (11:34 +0200)]
macvlan: Fix leaking skb in source mode with nodst option

The MACVLAN receive handler clones skbs to all matching source MACVLAN
interfaces, before it passes the packet along to match on destination
based MACVLANs.

When using the MACVLAN nodst mode, passing the packet to destination based
MACVLANs is omitted and the handler returns with RX_HANDLER_CONSUMED.
However, the passed skb is not freed, leaking for any packet processed
with the nodst option.

Properly free the skb when consuming packets to fix that leak.

Fixes: 427f0c8c194b ("macvlan: Add nodst option to macvlan type source")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: mtk_eth_soc: use after free in __mtk_ppe_check_skb()
Dan Carpenter [Tue, 12 Apr 2022 09:24:19 +0000 (12:24 +0300)]
net: ethernet: mtk_eth_soc: use after free in __mtk_ppe_check_skb()

The __mtk_foe_entry_clear() function frees "entry" so we have to use
the _safe() version of hlist_for_each_entry() to prevent a use after
free.

Fixes: 33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw-nuss: using pm_runtime_resume_and_get instead of pm_runt...
Minghao Chi [Tue, 12 Apr 2022 09:05:15 +0000 (09:05 +0000)]
net: ethernet: ti: am65-cpsw-nuss: using pm_runtime_resume_and_get instead of pm_runtime_get_sync

Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoNFC: NULL out the dev->rfkill to prevent UAF
Lin Ma [Tue, 12 Apr 2022 05:32:08 +0000 (13:32 +0800)]
NFC: NULL out the dev->rfkill to prevent UAF

Commit 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
assumes the device_is_registered() in function nfc_dev_up() will help
to check when the rfkill is unregistered. However, this check only
take effect when device_del(&dev->dev) is done in nfc_unregister_device().
Hence, the rfkill object is still possible be dereferenced.

The crash trace in latest kernel (5.18-rc2):

[   68.760105] ==================================================================
[   68.760330] BUG: KASAN: use-after-free in __lock_acquire+0x3ec1/0x6750
[   68.760756] Read of size 8 at addr ffff888009c93018 by task fuzz/313
[   68.760756]
[   68.760756] CPU: 0 PID: 313 Comm: fuzz Not tainted 5.18.0-rc2 #4
[   68.760756] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   68.760756] Call Trace:
[   68.760756]  <TASK>
[   68.760756]  dump_stack_lvl+0x57/0x7d
[   68.760756]  print_report.cold+0x5e/0x5db
[   68.760756]  ? __lock_acquire+0x3ec1/0x6750
[   68.760756]  kasan_report+0xbe/0x1c0
[   68.760756]  ? __lock_acquire+0x3ec1/0x6750
[   68.760756]  __lock_acquire+0x3ec1/0x6750
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  ? register_lock_class+0x18d0/0x18d0
[   68.760756]  lock_acquire+0x1ac/0x4f0
[   68.760756]  ? rfkill_blocked+0xe/0x60
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  ? mutex_lock_io_nested+0x12c0/0x12c0
[   68.760756]  ? nla_get_range_signed+0x540/0x540
[   68.760756]  ? _raw_spin_lock_irqsave+0x4e/0x50
[   68.760756]  _raw_spin_lock_irqsave+0x39/0x50
[   68.760756]  ? rfkill_blocked+0xe/0x60
[   68.760756]  rfkill_blocked+0xe/0x60
[   68.760756]  nfc_dev_up+0x84/0x260
[   68.760756]  nfc_genl_dev_up+0x90/0xe0
[   68.760756]  genl_family_rcv_msg_doit+0x1f4/0x2f0
[   68.760756]  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x230/0x230
[   68.760756]  ? security_capable+0x51/0x90
[   68.760756]  genl_rcv_msg+0x280/0x500
[   68.760756]  ? genl_get_cmd+0x3c0/0x3c0
[   68.760756]  ? lock_acquire+0x1ac/0x4f0
[   68.760756]  ? nfc_genl_dev_down+0xe0/0xe0
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  netlink_rcv_skb+0x11b/0x340
[   68.760756]  ? genl_get_cmd+0x3c0/0x3c0
[   68.760756]  ? netlink_ack+0x9c0/0x9c0
[   68.760756]  ? netlink_deliver_tap+0x136/0xb00
[   68.760756]  genl_rcv+0x1f/0x30
[   68.760756]  netlink_unicast+0x430/0x710
[   68.760756]  ? memset+0x20/0x40
[   68.760756]  ? netlink_attachskb+0x740/0x740
[   68.760756]  ? __build_skb_around+0x1f4/0x2a0
[   68.760756]  netlink_sendmsg+0x75d/0xc00
[   68.760756]  ? netlink_unicast+0x710/0x710
[   68.760756]  ? netlink_unicast+0x710/0x710
[   68.760756]  sock_sendmsg+0xdf/0x110
[   68.760756]  __sys_sendto+0x19e/0x270
[   68.760756]  ? __ia32_sys_getpeername+0xa0/0xa0
[   68.760756]  ? fd_install+0x178/0x4c0
[   68.760756]  ? fd_install+0x195/0x4c0
[   68.760756]  ? kernel_fpu_begin_mask+0x1c0/0x1c0
[   68.760756]  __x64_sys_sendto+0xd8/0x1b0
[   68.760756]  ? lockdep_hardirqs_on+0xbf/0x130
[   68.760756]  ? syscall_enter_from_user_mode+0x1d/0x50
[   68.760756]  do_syscall_64+0x3b/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   68.760756] RIP: 0033:0x7f67fb50e6b3
...
[   68.760756] RSP: 002b:00007f67fa91fe90 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[   68.760756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f67fb50e6b3
[   68.760756] RDX: 000000000000001c RSI: 0000559354603090 RDI: 0000000000000003
[   68.760756] RBP: 00007f67fa91ff00 R08: 00007f67fa91fedc R09: 000000000000000c
[   68.760756] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe824d496e
[   68.760756] R13: 00007ffe824d496f R14: 00007f67fa120000 R15: 0000000000000003

[   68.760756]  </TASK>
[   68.760756]
[   68.760756] Allocated by task 279:
[   68.760756]  kasan_save_stack+0x1e/0x40
[   68.760756]  __kasan_kmalloc+0x81/0xa0
[   68.760756]  rfkill_alloc+0x7f/0x280
[   68.760756]  nfc_register_device+0xa3/0x1a0
[   68.760756]  nci_register_device+0x77a/0xad0
[   68.760756]  nfcmrvl_nci_register_dev+0x20b/0x2c0
[   68.760756]  nfcmrvl_nci_uart_open+0xf2/0x1dd
[   68.760756]  nci_uart_tty_ioctl+0x2c3/0x4a0
[   68.760756]  tty_ioctl+0x764/0x1310
[   68.760756]  __x64_sys_ioctl+0x122/0x190
[   68.760756]  do_syscall_64+0x3b/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   68.760756]
[   68.760756] Freed by task 314:
[   68.760756]  kasan_save_stack+0x1e/0x40
[   68.760756]  kasan_set_track+0x21/0x30
[   68.760756]  kasan_set_free_info+0x20/0x30
[   68.760756]  __kasan_slab_free+0x108/0x170
[   68.760756]  kfree+0xb0/0x330
[   68.760756]  device_release+0x96/0x200
[   68.760756]  kobject_put+0xf9/0x1d0
[   68.760756]  nfc_unregister_device+0x77/0x190
[   68.760756]  nfcmrvl_nci_unregister_dev+0x88/0xd0
[   68.760756]  nci_uart_tty_close+0xdf/0x180
[   68.760756]  tty_ldisc_kill+0x73/0x110
[   68.760756]  tty_ldisc_hangup+0x281/0x5b0
[   68.760756]  __tty_hangup.part.0+0x431/0x890
[   68.760756]  tty_release+0x3a8/0xc80
[   68.760756]  __fput+0x1f0/0x8c0
[   68.760756]  task_work_run+0xc9/0x170
[   68.760756]  exit_to_user_mode_prepare+0x194/0x1a0
[   68.760756]  syscall_exit_to_user_mode+0x19/0x50
[   68.760756]  do_syscall_64+0x48/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae

This patch just add the null out of dev->rfkill to make sure such
dereference cannot happen. This is safe since the device_lock() already
protect the check/write from data race.

Fixes: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: exthdrs: use swap() instead of open coding it
Guo Zhengkui [Tue, 12 Apr 2022 03:20:58 +0000 (11:20 +0800)]
ipv6: exthdrs: use swap() instead of open coding it

Address the following coccicheck warning:
net/ipv6/exthdrs.c:620:44-45: WARNING opportunity for swap()

by using swap() for the swapping of variable values and drop
the tmp (`addr`) variable that is not needed any more.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: fib_rule_tests: add support to select a test to run
Alaa Mohamed [Tue, 12 Apr 2022 02:04:31 +0000 (04:04 +0200)]
selftests: net: fib_rule_tests: add support to select a test to run

Add boilerplate test loop in test to run all tests
in fib_rule_tests.sh

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: mtk_eth_soc: use standard property for cci-control-port
Lorenzo Bianconi [Mon, 11 Apr 2022 10:13:25 +0000 (12:13 +0200)]
net: ethernet: mtk_eth_soc: use standard property for cci-control-port

Rely on standard cci-control-port property to identify CCI port
reference.
Update mt7622 dts binding.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Wed, 13 Apr 2022 10:57:55 +0000 (11:57 +0100)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
40GbE Intel Wired LAN Driver Updates 2022-04-12

This series contains updates to i40e and ice drivers.

Joe Damato adds TSO support for MPLS packets on i40e and ice drivers. He
also adds tracking and reporting of tx_stopped statistic for i40e.

Nabil S. Alramli adds reporting of tx_restart to ethtool for i40e.

Mateusz adds new device id support for i40e.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'tls-rx-refactor-part-3'
David S. Miller [Wed, 13 Apr 2022 10:45:39 +0000 (11:45 +0100)]
Merge branch 'tls-rx-refactor-part-3'

Jakub Kicinski says:

====================
tls: rx: random refactoring part 3

TLS Rx refactoring. Part 3 of 3. This set is mostly around rx_list
and async processing. The last two patches are minor optimizations.
A couple of features to follow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: only copy IV from the packet for TLS 1.2
Jakub Kicinski [Mon, 11 Apr 2022 19:19:17 +0000 (12:19 -0700)]
tls: rx: only copy IV from the packet for TLS 1.2

TLS 1.3 and ChaChaPoly don't carry IV in the packet.
The code before this change would copy out iv_size
worth of whatever followed the TLS header in the packet
and then for TLS 1.3 | ChaCha overwrite that with
the sequence number. Waste of cycles especially
with TLS 1.2 being close to dead and TLS 1.3 being
the common case.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: use MAX_IV_SIZE for allocations
Jakub Kicinski [Mon, 11 Apr 2022 19:19:16 +0000 (12:19 -0700)]
tls: rx: use MAX_IV_SIZE for allocations

IVs are 8 or 16 bytes, no point reading out the exact value
for quantities this small.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: use async as an in-out argument
Jakub Kicinski [Mon, 11 Apr 2022 19:19:15 +0000 (12:19 -0700)]
tls: rx: use async as an in-out argument

Propagating EINPROGRESS thru multiple layers of functions is
error prone. Use darg->async as an in/out argument, like we
use darg->zc today. On input it tells the code if async is
allowed, on output if it took place.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: return the already-copied data on crypto error
Jakub Kicinski [Mon, 11 Apr 2022 19:19:14 +0000 (12:19 -0700)]
tls: rx: return the already-copied data on crypto error

async crypto handler will report the socket error no need
to report it again. We can, however, let the data we already
copied be reported to user space but we need to make sure
the error will be reported next time around.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: treat process_rx_list() errors as transient
Jakub Kicinski [Mon, 11 Apr 2022 19:19:13 +0000 (12:19 -0700)]
tls: rx: treat process_rx_list() errors as transient

process_rx_list() only fails if it can't copy data to user
space. There is no point recording the error onto sk->sk_err
or giving up on the data which was read partially. Treat
the return value like a normal socket partial read.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: assume crypto always calls our callback
Jakub Kicinski [Mon, 11 Apr 2022 19:19:12 +0000 (12:19 -0700)]
tls: rx: assume crypto always calls our callback

If crypto didn't always invoke our callback for async
we'd not be clearing skb->sk and would crash in the
skb core when freeing it. This if must be dead code.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: don't handle TLS 1.3 in the async crypto callback
Jakub Kicinski [Mon, 11 Apr 2022 19:19:11 +0000 (12:19 -0700)]
tls: rx: don't handle TLS 1.3 in the async crypto callback

Async crypto never worked with TLS 1.3 and was explicitly disabled in
commit 8497ded2d16c ("net/tls: Disable async decrytion for tls1.3").
There's no need for us to handle TLS 1.3 padding in the async cb.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: move counting TlsDecryptErrors for sync
Jakub Kicinski [Mon, 11 Apr 2022 19:19:10 +0000 (12:19 -0700)]
tls: rx: move counting TlsDecryptErrors for sync

Move counting TlsDecryptErrors to tls_do_decryption()
where differences between sync and async crypto are
reconciled.

No functional changes, this code just always gave
me a pause.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: reuse leave_on_list label for psock
Jakub Kicinski [Mon, 11 Apr 2022 19:19:09 +0000 (12:19 -0700)]
tls: rx: reuse leave_on_list label for psock

The code is identical, we can save a few LoC.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotls: rx: consistently use unlocked accessors for rx_list
Jakub Kicinski [Mon, 11 Apr 2022 19:19:08 +0000 (12:19 -0700)]
tls: rx: consistently use unlocked accessors for rx_list

rx_list is protected by the socket lock, no need to take
the built-in spin lock on accesses.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoALSA: usb-audio: Limit max buffer and period sizes per time
Takashi Iwai [Tue, 12 Apr 2022 13:07:40 +0000 (15:07 +0200)]
ALSA: usb-audio: Limit max buffer and period sizes per time

In the previous fix, we increased the max buffer bytes from 1MB to 4MB
so that we can use bigger buffers for the modern HiFi devices with
higher rates, more channels and wider formats.  OTOH, extending this
has a concern that too big buffer is allowed for the lower rates, less
channels and narrower formats; when an application tries to allocate
as big buffer as possible, it'll lead to unexpectedly too huge size.

Also, we had a problem about the inconsistent max buffer and period
bytes for the implicit feedback mode when both streams have different
channels.  This was fixed by the (relatively complex) patch to reduce
the max buffer and period bytes accordingly.

This is an alternative fix for those, a patch to kill two birds with
one stone (*): instead of increasing the max buffer bytes blindly and
applying the reduction per channels, we simply use the hw constraints
for the buffer and period "time".  Meanwhile the max buffer and period
bytes are set unlimited instead.

Since the inconsistency of buffer (and period) bytes comes from the
difference of the channels in the tied streams, as long as we care
only about the buffer (and period) time, it doesn't matter; the buffer
time is same for different channels, although we still allow higher
buffer size.  Similarly, this will allow more buffer bytes for HiFi
devices while it also keeps the reasonable size for the legacy
devices, too.

As of this patch, the max period and buffer time are set to 1 and 2
seconds, which should be large enough for all possible use cases.

(*) No animals were harmed in the making of this patch.

Fixes: 98c27add5d96 ("ALSA: usb-audio: Cap upper limits of buffer/period bytes for implicit fb")
Fixes: fee2ec8cceb3 ("ALSA: usb-audio: Increase max buffer size")
Link: https://lore.kernel.org/r/20220412130740.18933-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 years agoALSA: memalloc: Add fallback SG-buffer allocations for x86
Takashi Iwai [Wed, 13 Apr 2022 05:48:08 +0000 (07:48 +0200)]
ALSA: memalloc: Add fallback SG-buffer allocations for x86

The recent change for memory allocator replaced the SG-buffer handling
helper for x86 with the standard non-contiguous page handler.  This
works for most cases, but there is a corner case I obviously
overlooked, namely, the fallback of non-contiguous handler without
IOMMU.  When the system runs without IOMMU, the core handler tries to
use the continuous pages with a single SGL entry.  It works nicely for
most cases, but when the system memory gets fragmented, the large
allocation may fail frequently.

Ideally the non-contig handler could deal with the proper SG pages,
it's cumbersome to extend for now.  As a workaround, here we add new
types for (minimalistic) SG allocations, instead, so that the
allocator falls back to those types automatically when the allocation
with the standard API failed.

BTW, one better (but pretty minor) improvement from the previous
SG-buffer code is that this provides the proper mmap support without
the PCM's page fault handling.

Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)")
BugLink: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2272
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1198248
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220413054808.7547-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 years agoMerge tag 'hardening-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 13 Apr 2022 00:29:40 +0000 (14:29 -1000)]
Merge tag 'hardening-v5.18-rc3' of git://git./linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - latent_entropy: Use /dev/urandom instead of small GCC seed (Jason
   Donenfeld)

 - uapi/stddef.h: add missed include guards (Tadeusz Struk)

* tag 'hardening-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: latent_entropy: use /dev/urandom
  uapi/linux/stddef.h: Add include guards

2 years agoMerge tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Wed, 13 Apr 2022 00:23:19 +0000 (14:23 -1000)]
Merge tag 'nfsd-5.18-1' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a write performance regression

 - Fix crashes during request deferral on RDMA transports

* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix the svc_deferred_event trace class
  SUNRPC: Fix NFSD's request deferral on RDMA transports
  nfsd: Clean up nfsd_file_put()
  nfsd: Fix a write performance regression
  SUNRPC: Return true/false (not 1/0) from bool functions

2 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Wed, 13 Apr 2022 00:16:33 +0000 (14:16 -1000)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86:

   - Miscellaneous bugfixes

   - A small cleanup for the new workqueue code

   - Documentation syntax fix

  RISC-V:

   - Remove hgatp zeroing in kvm_arch_vcpu_put()

   - Fix alignment of the guest_hang() in KVM selftest

   - Fix PTE A and D bits in KVM selftest

   - Missing #include in vcpu_fp.c

  ARM:

   - Some PSCI fixes after introducing PSCIv1.1 and SYSTEM_RESET2

   - Fix the MMU write-lock not being taken on THP split

   - Fix mixed-width VM handling

   - Fix potential UAF when debugfs registration fails

   - Various selftest updates for all of the above"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (24 commits)
  KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU
  KVM: SVM: Do not activate AVIC for SEV-enabled guest
  Documentation: KVM: Add SPDX-License-Identifier tag
  selftests: kvm: add tsc_scaling_sync to .gitignore
  RISC-V: KVM: include missing hwcap.h into vcpu_fp
  KVM: selftests: riscv: Fix alignment of the guest_hang() function
  KVM: selftests: riscv: Set PTE A and D bits in VS-stage page table
  RISC-V: KVM: Don't clear hgatp CSR in kvm_arch_vcpu_put()
  selftests: KVM: Free the GIC FD when cleaning up in arch_timer
  selftests: KVM: Don't leak GIC FD across dirty log test iterations
  KVM: Don't create VM debugfs files outside of the VM directory
  KVM: selftests: get-reg-list: Add KVM_REG_ARM_FW_REG(3)
  KVM: avoid NULL pointer dereference in kvm_dirty_ring_push
  KVM: arm64: selftests: Introduce vcpu_width_config
  KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs
  KVM: arm64: vgic: Remove unnecessary type castings
  KVM: arm64: Don't split hugepages outside of MMU write lock
  KVM: arm64: Drop unneeded minor version check from PSCI v1.x handler
  KVM: arm64: Actually prevent SMC64 SYSTEM_RESET2 from AArch32
  KVM: arm64: Generally disallow SMC64 for AArch32 guests
  ...

2 years agoMerge tag 'media/v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Wed, 13 Apr 2022 00:08:43 +0000 (14:08 -1000)]
Merge tag 'media/v5.18-2' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - a regression fix for si2157

 - a Kconfig dependency fix for imx-mipi-csis

 - fix the rockchip/rga driver probing logic

* tag 'media/v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: si2157: unknown chip version Si2147-A30 ROM 0x50
  media: platform: imx-mipi-csis: Add dependency on VIDEO_DEV
  media: rockchip/rga: do proper error checking in probe

2 years agostat: fix inconsistency between struct stat and struct compat_stat
Mikulas Patocka [Tue, 12 Apr 2022 09:41:00 +0000 (05:41 -0400)]
stat: fix inconsistency between struct stat and struct compat_stat

struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
st_dev and st_rdev; struct compat_stat (defined in
arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
a 16-bit padding.

This patch fixes struct compat_stat to match struct stat.

[ Historical note: the old x86 'struct stat' did have that 16-bit field
  that the compat layer had kept around, but it was changes back in 2003
  by "struct stat - support larger dev_t":

    https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d

  and back in those days, the x86_64 port was still new, and separate
  from the i386 code, and had already picked up the old version with a
  16-bit st_dev field ]

Note that we can't change compat_dev_t because it is used by
compat_loop_info.

Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
old_valid_dev to test if the value fits into them.  This fixes
-EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
number 259.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>