platform/kernel/linux-rpi.git
2 years agomm: vmscan: Reduce throttling due to a failure to make progress
Mel Gorman [Thu, 2 Dec 2021 15:06:14 +0000 (15:06 +0000)]
mm: vmscan: Reduce throttling due to a failure to make progress

Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.  In
Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling.  In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is
being made") introduced the problem although commit a19594ca4a8b
("mm/vmscan: increase the timeout if page reclaim is not making
progress") made it worse.  Systems at or near an OOM state that cannot
be recovered must reach OOM quickly and memcg should kill tasks if a
memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there
were excessive pages pending for writeback.  If kswapd has stopped
reclaiming due to excessive failures, do not stall at all so that OOM
triggers relatively quickly.  Similarly, if an LRU is simply congested,
only lightly throttle similar to NOPROGRESS.

Alexey's original case was the most straight forward

for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times.  On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
lot with lots of frames missing and numerous audio glitches.  With this
patch applies, the video plays similarly to 5.15.

[lkp@intel.com: Fix W=1 build warning]

Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net
Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/
Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info>
Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 31 Dec 2021 17:28:48 +0000 (09:28 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc mm fixes from Andrew Morton:
 "2 patches.

  Subsystems affected by this patch series: mm (userfaultfd and damon)"

* akpm:
  mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
  userfaultfd/selftests: fix hugetlb area allocations

2 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 31 Dec 2021 17:22:25 +0000 (09:22 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three fixes, all in drivers. The lpfc one doesn't look exploitable,
  but nasty things could happen in string operations if mybuf ends up
  with an on stack unterminated string"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: vmw_pvscsi: Set residual data length conditionally
  scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
  scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()

2 years agomm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
SeongJae Park [Fri, 31 Dec 2021 04:12:34 +0000 (20:12 -0800)]
mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'

DAMON debugfs interface increases the reference counts of 'struct pid's
for targets from the 'target_ids' file write callback
('dbgfs_target_ids_write()'), but decreases the counts only in DAMON
monitoring termination callback ('dbgfs_before_terminate()').

Therefore, when 'target_ids' file is repeatedly written without DAMON
monitoring start/termination, the reference count is not decreased and
therefore memory for the 'struct pid' cannot be freed.  This commit
fixes this issue by decreasing the reference counts when 'target_ids' is
written.

Link: https://lkml.kernel.org/r/20211229124029.23348-1-sj@kernel.org
Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agouserfaultfd/selftests: fix hugetlb area allocations
Mike Kravetz [Fri, 31 Dec 2021 04:12:31 +0000 (20:12 -0800)]
userfaultfd/selftests: fix hugetlb area allocations

Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh
or any environment where there are 'just enough' hugetlb pages will
always fail with:

  testing events (fork, remap, remove):
ERROR: UFFDIO_COPY error: -12 (errno=12, line=616)

The ENOMEM error code implies there are not enough hugetlb pages.
However, there are free hugetlb pages but they are all reserved.  There
is a basic problem with the way the test allocates hugetlb pages which
has existed since the test was originally written.

Due to the way 'cleanup' was done between different phases of the test,
this issue was masked until recently.  The issue was uncovered by commit
8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each
test").

For the hugetlb test, src and dst areas are allocated as PRIVATE
mappings of a hugetlb file.  This means that at mmap time, pages are
reserved for the src and dst areas.  At the start of event testing (and
other tests) the src area is populated which results in allocation of
huge pages to fill the area and consumption of reserves associated with
the area.  Then, a child is forked to fault in the dst area.  Note that
the dst area was allocated in the parent and hence the parent owns the
reserves associated with the mapping.  The child has normal access to
the dst area, but can not use the reserves created/owned by the parent.
Thus, if there are no other huge pages available allocation of a page
for the dst by the child will fail.

Fix by not creating reserves for the dst area.  In this way the child
can use free (non-reserved) pages.

Also, MAP_PRIVATE of a file only makes sense if you are interested in
the contents of the file before making a COW copy.  The test does not do
this.  So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous
hugetlb mapping.  There is no need to create a hugetlb file in the
non-shared case.

Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 31 Dec 2021 02:25:43 +0000 (18:25 -0800)]
Merge tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This is a bit bigger than I'd like, however it has two weeks of amdgpu
  fixes in it, since they missed last week, which was very small.

  The nouveau regression is probably the biggest fix in here, and it
  needs to go into 5.15 as well, two i915 fixes, and then a scattering
  of amdgpu fixes. The biggest fix in there is for a fencing NULL
  pointer dereference, the rest are pretty minor.

  For the misc team, I've pulled the two misc fixes manually since I'm
  not sure what is happening at this time of year!

  The amdgpu maintainers have the outstanding runpm regression to fix
  still, they are just working through the last bits of it now.

  Summary:

  nouveau:
   - fencing regression fix

  i915:
   - Fix possible uninitialized variable
   - Fix composite fence seqno icrement on each fence creation

  amdgpu:
   - Fencing fix
   - XGMI fix
   - VCN regression fix
   - IP discovery regression fixes
   - Fix runpm documentation
   - Suspend/resume fixes
   - Yellow Carp display fixes
   - MCLK power management fix
   - dma-buf fix"

* tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
  drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
  drm/amd/display: Set optimize_pwr_state for DCN31
  drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
  drm/amd/display: Added power down for DCN10
  drm/amd/display: fix B0 TMDS deepcolor no dislay issue
  drm/amdgpu: no DC support for headless chips
  drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
  drm/amdgpu: always reset the asic in suspend (v2)
  drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
  drm/i915: Increment composite fence seqno
  drm/i915: Fix possible uninitialized variable in parallel extension
  drm/amdgpu: fix runpm documentation
  drm/nouveau: wait for the exclusive fence after the shared ones v2
  drm/amdgpu: add support for IP discovery gc_info table v2
  drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  drm/amd/pm: Fix xgmi link control on aldebaran
  drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence
  drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify

2 years agoMerge branch 'drm-misc-fixes' of ssh://git.freedesktop.org/git/drm/drm-misc into...
Dave Airlie [Fri, 31 Dec 2021 01:40:29 +0000 (11:40 +1000)]
Merge branch 'drm-misc-fixes' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes

This merges two fixes that haven't been sent to me yet, but I wanted to get in.

One amdgpu fix, but one nouveau regression fixer.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2 years agofs/mount_setattr: always cleanup mount_kattr
Christian Brauner [Thu, 30 Dec 2021 19:23:09 +0000 (20:23 +0100)]
fs/mount_setattr: always cleanup mount_kattr

Make sure that finish_mount_kattr() is called after mount_kattr was
succesfully built in both the success and failure case to prevent
leaking any references we took when we built it.  We returned early if
path lookup failed thereby risking to leak an additional reference we
took when building mount_kattr when an idmapped mount was requested.

Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 30 Dec 2021 19:12:12 +0000 (11:12 -0800)]
Merge tag 'net-5.16-rc8' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from.. Santa?

  No regressions on our radar at this point. The igc problem fixed here
  was the last one I was tracking but it was broken in previous
  releases, anyway. Mostly driver fixes and a couple of largish SMC
  fixes.

  Current release - regressions:

   - xsk: initialise xskb free_list_node, fixup for a -rc7 fix

  Current release - new code bugs:

   - mlx5: handful of minor fixes:

   - use first online CPU instead of hard coded CPU

   - fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'

   - fix skb memory leak when TC classifier action offloads are disabled

   - fix memory leak with rules with internal OvS port

  Previous releases - regressions:

   - igc: do not enable crosstimestamping for i225-V models

  Previous releases - always broken:

   - udp: use datalen to cap ipv6 udp max gso segments

   - fix use-after-free in tw_timer_handler due to early free of stats

   - smc: fix kernel panic caused by race of smc_sock

   - smc: don't send CDC/LLC message if link not ready, avoid timeouts

   - sctp: use call_rcu to free endpoint, avoid UAF in sock diag

   - bridge: mcast: add and enforce query interval minimum

   - usb: pegasus: do not drop long Ethernet frames

   - mlx5e: fix ICOSQ recovery flow for XSK

   - nfc: uapi: use kernel size_t to fix user-space builds"

* tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  fsl/fman: Fix missing put_device() call in fman_port_probe
  selftests: net: using ping6 for IPv6 in udpgro_fwd.sh
  Documentation: fix outdated interpretation of ip_no_pmtu_disc
  net/ncsi: check for error return from call to nla_put_u32
  net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
  net: fix use-after-free in tw_timer_handler
  selftests: net: Fix a typo in udpgro_fwd.sh
  selftests/net: udpgso_bench_tx: fix dst ip argument
  net: bridge: mcast: add and enforce startup query interval minimum
  net: bridge: mcast: add and enforce query interval minimum
  ipv6: raw: check passed optlen before reading
  xsk: Initialise xskb free_list_node
  net/mlx5e: Fix wrong features assignment in case of error
  net/mlx5e: TC, Fix memory leak with rules with internal port
  ionic: Initialize the 'lif->dbid_inuse' bitmap
  igc: Fix TX timestamp support for non-MSI-X platforms
  igc: Do not enable crosstimestamping for i225-V models
  net/smc: fix kernel panic caused by race of smc_sock
  net/smc: don't send CDC/LLC message if link not ready
  NFC: st21nfca: Fix memory leak in device probe and remove
  ...

2 years agoMerge tag 'char-misc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Thu, 30 Dec 2021 17:52:32 +0000 (09:52 -0800)]
Merge tag 'char-misc-5.16' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are two misc driver fixes for 5.16-final:

   - binder accounting fix to resolve reported problem

   - nitro_enclaves fix for mmap assert warning output

  Both of these have been for over a week with no reported issues"

* tag 'char-misc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  nitro_enclaves: Use get_user_pages_unlocked() call to handle mmap assert
  binder: fix async_free_space accounting for empty parcels

2 years agoMerge tag 'usb-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Thu, 30 Dec 2021 17:49:54 +0000 (09:49 -0800)]
Merge tag 'usb-5.16' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes for 5.16 to resolve some reported
  problems:

   - mtu3 driver fixes

   - typec ucsi driver fix

   - xhci driver quirk added

   - usb gadget f_fs fix for reported crash

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'usb-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Only check the contract if there is a connection
  xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
  usb: mtu3: set interval of FS intr and isoc endpoint
  usb: mtu3: fix list_head check warning
  usb: mtu3: add memory barrier before set GPD's HWO
  usb: mtu3: fix interval value for intr and isoc
  usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.

2 years agofsl/fman: Fix missing put_device() call in fman_port_probe
Miaoqian Lin [Thu, 30 Dec 2021 12:26:27 +0000 (12:26 +0000)]
fsl/fman: Fix missing put_device() call in fman_port_probe

The reference taken by 'of_find_device_by_node()' must be released when
not needed anymore.
Add the corresponding 'put_device()' in the and error handling paths.

Fixes: 18a6c85fcc78 ("fsl/fman: Add FMan Port Support")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: using ping6 for IPv6 in udpgro_fwd.sh
Jianguo Wu [Thu, 30 Dec 2021 10:40:29 +0000 (18:40 +0800)]
selftests: net: using ping6 for IPv6 in udpgro_fwd.sh

udpgro_fwd.sh output following message:
  ping: 2001:db8:1::100: Address family for hostname not supported

Using ping6 when pinging IPv6 addresses.

Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoDocumentation: fix outdated interpretation of ip_no_pmtu_disc
xu xin [Thu, 30 Dec 2021 03:28:56 +0000 (03:28 +0000)]
Documentation: fix outdated interpretation of ip_no_pmtu_disc

The updating way of pmtu has changed, but documentation is still in the
old way. So this patch updates the interpretation of ip_no_pmtu_disc and
min_pmtu.

See commit 28d35bcdd3925 ("net: ipv4: don't let PMTU updates increase
route MTU")

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'amd-drm-fixes-5.16-2021-12-29' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 30 Dec 2021 03:55:47 +0000 (13:55 +1000)]
Merge tag 'amd-drm-fixes-5.16-2021-12-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.16-2021-12-29:

amdgpu:
- Fencing fix
- XGMI fix
- VCN regression fix
- IP discovery regression fixes
- Fix runpm documentation
- Suspend/resume fixes
- Yellow Carp display fixes
- MCLK power management fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211229155129.5789-1-alexander.deucher@amd.com
2 years agoMerge tag 'mlx5-fixes-2021-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 30 Dec 2021 02:19:01 +0000 (18:19 -0800)]
Merge tag 'mlx5-fixes-2021-12-28' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-12-28

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2021-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Fix wrong features assignment in case of error
  net/mlx5e: TC, Fix memory leak with rules with internal port
====================

Link: https://lore.kernel.org/r/20211229065352.30178-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'drm-intel-fixes-2021-12-29' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Thu, 30 Dec 2021 02:12:40 +0000 (12:12 +1000)]
Merge tag 'drm-intel-fixes-2021-12-29' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.16:
- Fix possible uninitialized variable
- Fix composite fence seqno icrement on each fence creation

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87h7ark5r5.fsf@intel.com
2 years agonet/ncsi: check for error return from call to nla_put_u32
Jiasheng Jiang [Wed, 29 Dec 2021 03:21:18 +0000 (11:21 +0800)]
net/ncsi: check for error return from call to nla_put_u32

As we can see from the comment of the nla_put() that it could return
-EMSGSIZE if the tailroom of the skb is insufficient.
Therefore, it should be better to check the return value of the
nla_put_u32 and return the error code if error accurs.
Also, there are many other functions have the same problem, and if this
patch is correct, I will commit a new version to fix all.

Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Link: https://lore.kernel.org/r/20211229032118.1706294-1-jiasheng@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
Nikolay Aleksandrov [Tue, 28 Dec 2021 15:31:42 +0000 (17:31 +0200)]
net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper

We need to first check if the context is a vlan one, then we need to
check the global bridge multicast vlan snooping flag, and finally the
vlan's multicast flag, otherwise we will unnecessarily enable vlan mcast
processing (e.g. querier timers).

Fixes: 7b54aaaf53cb ("net: bridge: multicast: add vlan state initialization and control")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20211228153142.536969-1-nikolay@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: fix use-after-free in tw_timer_handler
Muchun Song [Tue, 28 Dec 2021 10:41:45 +0000 (18:41 +0800)]
net: fix use-after-free in tw_timer_handler

A real world panic issue was found as follow in Linux 5.4.

    BUG: unable to handle page fault for address: ffffde49a863de28
    PGD 7e6fe62067 P4D 7e6fe62067 PUD 7e6fe63067 PMD f51e064067 PTE 0
    RIP: 0010:tw_timer_handler+0x20/0x40
    Call Trace:
     <IRQ>
     call_timer_fn+0x2b/0x120
     run_timer_softirq+0x1ef/0x450
     __do_softirq+0x10d/0x2b8
     irq_exit+0xc7/0xd0
     smp_apic_timer_interrupt+0x68/0x120
     apic_timer_interrupt+0xf/0x20

This issue was also reported since 2017 in the thread [1],
unfortunately, the issue was still can be reproduced after fixing
DCCP.

The ipv4_mib_exit_net is called before tcp_sk_exit_batch when a net
namespace is destroyed since tcp_sk_ops is registered befrore
ipv4_mib_ops, which means tcp_sk_ops is in the front of ipv4_mib_ops
in the list of pernet_list. There will be a use-after-free on
net->mib.net_statistics in tw_timer_handler after ipv4_mib_exit_net
if there are some inflight time-wait timers.

This bug is not introduced by commit f2bf415cfed7 ("mib: add net to
NET_ADD_STATS_BH") since the net_statistics is a global variable
instead of dynamic allocation and freeing. Actually, commit
61a7e26028b9 ("mib: put net statistics on struct net") introduces
the bug since it put net statistics on struct net and free it when
net namespace is destroyed.

Moving init_ipv4_mibs() to the front of tcp_init() to fix this bug
and replace pr_crit() with panic() since continuing is meaningless
when init_ipv4_mibs() fails.

[1] https://groups.google.com/g/syzkaller/c/p1tn-_Kc6l4/m/smuL_FMAAgAJ?pli=1

Fixes: 61a7e26028b9 ("mib: put net statistics on struct net")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20211228104145.9426-1-songmuchun@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: net: Fix a typo in udpgro_fwd.sh
Jianguo Wu [Wed, 29 Dec 2021 07:27:30 +0000 (15:27 +0800)]
selftests: net: Fix a typo in udpgro_fwd.sh

$rvs -> $rcv

Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Link: https://lore.kernel.org/r/d247d7c8-a03a-0abf-3c71-4006a051d133@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/net: udpgso_bench_tx: fix dst ip argument
wujianguo [Wed, 29 Dec 2021 10:58:10 +0000 (18:58 +0800)]
selftests/net: udpgso_bench_tx: fix dst ip argument

udpgso_bench_tx call setup_sockaddr() for dest address before
parsing all arguments, if we specify "-p ${dst_port}" after "-D ${dst_ip}",
then ${dst_port} will be ignored, and using default cfg_port 8000.

This will cause test case "multiple GRO socks" failed in udpgro.sh.

Setup sockaddr after parsing all arguments.

Fixes: 3a687bef148d ("selftests: udp gso benchmark")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/ff620d9f-5b52-06ab-5286-44b945453002@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-bridge-mcast-add-and-enforce-query-interval-minimum'
Jakub Kicinski [Wed, 29 Dec 2021 20:59:43 +0000 (12:59 -0800)]
Merge branch 'net-bridge-mcast-add-and-enforce-query-interval-minimum'

Nikolay Aleksandrov says:

====================
net: bridge: mcast: add and enforce query interval minimum

This set adds and enforces 1 second minimum value for bridge multicast
query and startup query intervals in order to avoid rearming the timers
too often which could lock and crash the host. I doubt anyone is using
such low values or anything lower than 1 second, so it seems like a good
minimum. In order to be compatible if the value is lower then it is
overwritten and a log message is emitted, since we can't return an error
at this point.

Eric, I looked for the syzbot reports in its dashboard but couldn't find
them so I've added you as the reporter.

I've prepared a global bridge igmp rate limiting patch but wasn't
sure if it's ok for -net. It adds a static limit of 32k packets per
second, I plan to send it for net-next with added drop counters for
each bridge so it can be easily debugged.

Original report can be seen at:
https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/
====================

Link: https://lore.kernel.org/r/20211227172116.320768-1-nikolay@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: bridge: mcast: add and enforce startup query interval minimum
Nikolay Aleksandrov [Mon, 27 Dec 2021 17:21:16 +0000 (19:21 +0200)]
net: bridge: mcast: add and enforce startup query interval minimum

As reported[1] if startup query interval is set too low in combination with
large number of startup queries and we have multiple bridges or even a
single bridge with multiple querier vlans configured we can crash the
machine. Add a 1 second minimum which must be enforced by overwriting the
value if set lower (i.e. without returning an error) to avoid breaking
user-space. If that happens a log message is emitted to let the admin know
that the startup interval has been set to the minimum. It doesn't make
sense to make the startup interval lower than the normal query interval
so use the same value of 1 second. The issue has been present since these
intervals could be user-controlled.

[1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/

Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: bridge: mcast: add and enforce query interval minimum
Nikolay Aleksandrov [Mon, 27 Dec 2021 17:21:15 +0000 (19:21 +0200)]
net: bridge: mcast: add and enforce query interval minimum

As reported[1] if query interval is set too low and we have multiple
bridges or even a single bridge with multiple querier vlans configured
we can crash the machine. Add a 1 second minimum which must be enforced
by overwriting the value if set lower (i.e. without returning an error) to
avoid breaking user-space. If that happens a log message is emitted to let
the administrator know that the interval has been set to the minimum.
The issue has been present since these intervals could be user-controlled.

[1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/

Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv6: raw: check passed optlen before reading
Tamir Duberstein [Wed, 29 Dec 2021 20:09:47 +0000 (15:09 -0500)]
ipv6: raw: check passed optlen before reading

Add a check that the user-provided option is at least as long as the
number of bytes we intend to read. Before this patch we would blindly
read sizeof(int) bytes even in cases where the user passed
optlen<sizeof(int), which would potentially read garbage or fault.

Discovered by new tests in https://github.com/google/gvisor/pull/6957 .

The original get_user call predates history in the git repo.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20211229200947.2862255-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 's390-5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Wed, 29 Dec 2021 18:07:20 +0000 (10:07 -0800)]
Merge tag 's390-5.16-6' of git://git./linux/kernel/git/s390/linux

Pull s390 fix from Heiko Carstens:

 - fix s390 mcount regex typo in recordmcount.pl

* tag 's390-5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  recordmcount.pl: fix typo in s390 mcount regex

2 years agoxsk: Initialise xskb free_list_node
Ciara Loftus [Mon, 20 Dec 2021 15:52:50 +0000 (15:52 +0000)]
xsk: Initialise xskb free_list_node

This commit initialises the xskb's free_list_node when the xskb is
allocated. This prevents a potential false negative returned from a call
to list_empty for that node, such as the one introduced in commit
199d983bc015 ("xsk: Fix crash on double free in buffer pool")

In my environment this issue caused packets to not be received by
the xdpsock application if the traffic was running prior to application
launch. This happened when the first batch of packets failed the xskmap
lookup and XDP_PASS was returned from the bpf program. This action is
handled in the i40e zc driver (and others) by allocating an skbuff,
freeing the xdp_buff and adding the associated xskb to the
xsk_buff_pool's free_list if it hadn't been added already. Without this
fix, the xskb is not added to the free_list because the check to determine
if it was added already returns an invalid positive result. Later, this
caused allocation errors in the driver and the failure to receive packets.

Fixes: 199d983bc015 ("xsk: Fix crash on double free in buffer pool")
Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20211220155250.2746-1-ciara.loftus@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Fix wrong features assignment in case of error
Gal Pressman [Mon, 29 Nov 2021 09:08:41 +0000 (11:08 +0200)]
net/mlx5e: Fix wrong features assignment in case of error

In case of an error in mlx5e_set_features(), 'netdev->features' must be
updated with the correct state of the device to indicate which features
were updated successfully.
To do that we maintain a copy of 'netdev->features' and update it after
successful feature changes, so we can assign it to back to
'netdev->features' if needed.

However, since not all netdev features are handled by the driver (e.g.
GRO/TSO/etc), some features may not be updated correctly in case of an
error updating another feature.

For example, while requesting to disable TSO (feature which is not
handled by the driver) and enable HW-GRO, if an error occurs during
HW-GRO enable, 'oper_features' will be assigned with 'netdev->features'
and HW-GRO turned off. TSO will remain enabled in such case, which is a
bug.

To solve that, instead of using 'netdev->features' as the baseline of
'oper_features' and changing it on set feature success, use 'features'
instead and update it in case of errors.

Fixes: 75b81ce719b7 ("net/mlx5e: Don't override netdev features field unless in error flow")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: TC, Fix memory leak with rules with internal port
Roi Dayan [Wed, 22 Dec 2021 07:20:58 +0000 (09:20 +0200)]
net/mlx5e: TC, Fix memory leak with rules with internal port

Fix a memory leak with decap rule with internal port as destination
device. The driver allocates a modify hdr action but doesn't set
the flow attr modify hdr action which results in skipping releasing
the modify hdr action when releasing the flow.

backtrace:
    [<000000005f8c651c>] krealloc+0x83/0xd0
    [<000000009f59b143>] alloc_mod_hdr_actions+0x156/0x310 [mlx5_core]
    [<000000002257f342>] mlx5e_tc_match_to_reg_set_and_get_id+0x12a/0x360 [mlx5_core]
    [<00000000b44ea75a>] mlx5e_tc_add_fdb_flow+0x962/0x1470 [mlx5_core]
    [<0000000003e384a0>] __mlx5e_add_fdb_flow+0x54c/0xb90 [mlx5_core]
    [<00000000ed8b22b6>] mlx5e_configure_flower+0xe45/0x4af0 [mlx5_core]
    [<00000000024f4ab5>] mlx5e_rep_indr_offload.isra.0+0xfe/0x1b0 [mlx5_core]
    [<000000006c3bb494>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core]
    [<00000000d3dac2ea>] tc_setup_cb_add+0x1d2/0x420

Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Wed, 29 Dec 2021 00:19:09 +0000 (16:19 -0800)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-12-28

This series contains updates to igc driver only.

Vinicius disables support for crosstimestamp on i225-V as lockups are being
observed.

James McLaughlin fixes Tx timestamping support on non-MSI-X platforms.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: Fix TX timestamp support for non-MSI-X platforms
  igc: Do not enable crosstimestamping for i225-V models
====================

Link: https://lore.kernel.org/r/20211228182421.340354-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoionic: Initialize the 'lif->dbid_inuse' bitmap
Christophe JAILLET [Sun, 26 Dec 2021 14:06:17 +0000 (15:06 +0100)]
ionic: Initialize the 'lif->dbid_inuse' bitmap

When allocated, this bitmap is not initialized. Only the first bit is set a
few lines below.

Use bitmap_zalloc() to make sure that it is cleared before being used.

Fixes: 6461b446f2a0 ("ionic: Add interrupts and doorbells")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Link: https://lore.kernel.org/r/6a478eae0b5e6c63774e1f0ddb1a3f8c38fa8ade.1640527506.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodrm/amd/display: Changed pipe split policy to allow for multi-display pipe split
Angus Wang [Thu, 9 Dec 2021 22:27:01 +0000 (17:27 -0500)]
drm/amd/display: Changed pipe split policy to allow for multi-display pipe split

[WHY]
Current implementation of pipe split policy prevents pipe split with
multiple displays connected, which caused the MCLK speed to be stuck at
max

[HOW]
Changed the pipe split policies so that pipe split is allowed for
multi-display configurations

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1522
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1709
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1655
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1403

Note this is a backport of this commit from amdgpu drm-next for 5.16.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Angus Wang <angus.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2 years agodrm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
Nicholas Kazlauskas [Fri, 17 Dec 2021 19:18:59 +0000 (14:18 -0500)]
drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config

[Why]
A porting error on a previous patch left the block of code that
causes the crash from a NULL pointer dereference.

More specifically, we try to access link_enc before it's assigned in
the USB4 case in the following assignment:

config.dio_output_idx = link_enc->transmitter - TRANSMITTER_UNIPHY_A;

[How]
That assignment occurs later depending on the ASIC version. It's only
needed on DCN31 and only after link_enc is already assigned.

Fixes: 986430446c917b ("drm/amd/display: fix a crash on USB4 over C20 PHY")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Set optimize_pwr_state for DCN31
Nicholas Kazlauskas [Thu, 9 Dec 2021 21:05:36 +0000 (16:05 -0500)]
drm/amd/display: Set optimize_pwr_state for DCN31

[Why]
We'll exit optimized power state to do link detection but we won't enter
back into the optimized power state.

This could potentially block s2idle entry depending on the sequencing,
but it also means we're losing some power during the transition period.

[How]
Hook up the handler like DCN21. It was also missed like the
exit_optimized_pwr_state callback.

Fixes: 64b1d0e8d500 ("drm/amd/display: Add DCN3.1 HWSEQ")

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
Nicholas Kazlauskas [Thu, 9 Dec 2021 18:53:36 +0000 (13:53 -0500)]
drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization

[Why]
Otherwise SMU won't mark Display as idle when trying to perform s2idle.

[How]
Mark the bit in the dcn31 codepath, doesn't apply to older ASIC.

It needed to be split from phy refclk off to prevent entering s2idle
when PSR was engaged but driver was not ready.

Fixes: 118a33151658 ("drm/amd/display: Add DCN3.1 clock manager support")

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Added power down for DCN10
Lai, Derek [Mon, 6 Dec 2021 09:10:59 +0000 (17:10 +0800)]
drm/amd/display: Added power down for DCN10

[Why]
The change of setting a timer callback on boot for 10 seconds is still
working, just lacked power down for DCN10.

[How]
Added power down for DCN10.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: fix B0 TMDS deepcolor no dislay issue
Charlene Liu [Mon, 6 Dec 2021 02:19:30 +0000 (21:19 -0500)]
drm/amd/display: fix B0 TMDS deepcolor no dislay issue

[why]
B0 PHY C map to F, D map to G driver use logic instance, dmub does the
remap. Driver still need use the right PHY instance to access right HW.

[how]
use phyical instance when program PHY register.

[note]
could move resync_control programming to dmub next.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agoMerge tag 'selinux-pr-20211228' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 28 Dec 2021 21:33:06 +0000 (13:33 -0800)]
Merge tag 'selinux-pr-20211228' of git://git./linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "One more small SELinux patch to address an uninitialized stack
  variable"

* tag 'selinux-pr-20211228' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: initialize proto variable in selinux_ip_postroute_compat()

2 years agoMerge tag 'auxdisplay-for-linus-v5.16' of git://github.com/ojeda/linux
Linus Torvalds [Tue, 28 Dec 2021 19:46:15 +0000 (11:46 -0800)]
Merge tag 'auxdisplay-for-linus-v5.16' of git://github.com/ojeda/linux

Pull auxdisplay fixes from Miguel Ojeda:
 "A couple of improvements for charlcd:

   - check pointer before dereferencing

   - fix coding style issue"

* tag 'auxdisplay-for-linus-v5.16' of git://github.com/ojeda/linux:
  auxdisplay: charlcd: checking for pointer reference before dereferencing
  auxdisplay: charlcd: fixing coding style issue

2 years agoMerge tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Tue, 28 Dec 2021 19:42:01 +0000 (11:42 -0800)]
Merge tag 'powerpc-5.16-5' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Fix DEBUG_WX never reporting any WX mappings, due to use of an
  incorrect config symbol since we converted to using generic ptdump"

* tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion

2 years agoigc: Fix TX timestamp support for non-MSI-X platforms
James McLaughlin [Fri, 17 Dec 2021 23:49:33 +0000 (16:49 -0700)]
igc: Fix TX timestamp support for non-MSI-X platforms

Time synchronization was not properly enabled on non-MSI-X platforms.

Fixes: 2c344ae24501 ("igc: Add support for TX timestamping")
Signed-off-by: James McLaughlin <james.mclaughlin@qsc.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoigc: Do not enable crosstimestamping for i225-V models
Vinicius Costa Gomes [Tue, 14 Dec 2021 00:39:49 +0000 (16:39 -0800)]
igc: Do not enable crosstimestamping for i225-V models

It was reported that when PCIe PTM is enabled, some lockups could
be observed with some integrated i225-V models.

While the issue is investigated, we can disable crosstimestamp for
those models and see no loss of functionality, because those models
don't have any support for time synchronization.

Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Link: https://lore.kernel.org/all/924175a188159f4e03bd69908a91e606b574139b.camel@gmx.de/
Reported-by: Stefan Dietrich <roots@gmx.de>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agodrm/amdgpu: no DC support for headless chips
Alex Deucher [Thu, 23 Dec 2021 19:13:02 +0000 (14:13 -0500)]
drm/amdgpu: no DC support for headless chips

Chips with no display hardware should return false for
DC support.

v2: drop Arcturus and Aldebaran

Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reported-by: Tareque Md.Hanif <tarequemd.hanif@yahoo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agoMerge branch 'smc-fixes'
David S. Miller [Tue, 28 Dec 2021 12:42:46 +0000 (12:42 +0000)]
Merge branch 'smc-fixes'

Dust Li says:

====================
net/smc: fix kernel panic caused by race of smc_sock

This patchset fixes the race between smc_release triggered by
close(2) and cdc_handle triggered by underlaying RDMA device.

The race is caused because the smc_connection may been released
before the pending tx CDC messages got its CQEs. In order to fix
this, I add a counter to track how many pending WRs we have posted
through the smc_connection, and only release the smc_connection
after there is no pending WRs on the connection.

The first patch prevents posting WR on a QP that is not in RTS
state. This patch is needed because if we post WR on a QP that
is not in RTS state, ib_post_send() may success but no CQE will
return, and that will confuse the counter tracking the pending
WRs.

The second patch add a counter to track how many WRs were posted
through the smc_connection, and don't reset the QP on link destroying
to prevent leak of the counter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: fix kernel panic caused by race of smc_sock
Dust Li [Tue, 28 Dec 2021 09:03:25 +0000 (17:03 +0800)]
net/smc: fix kernel panic caused by race of smc_sock

A crash occurs when smc_cdc_tx_handler() tries to access smc_sock
but smc_release() has already freed it.

[ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
[ 4570.696048] #PF: supervisor write access in kernel mode
[ 4570.696728] #PF: error_code(0x0002) - not-present page
[ 4570.697401] PGD 0 P4D 0
[ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
[ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
[ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
<...>
[ 4570.711446] Call Trace:
[ 4570.711746]  <IRQ>
[ 4570.711992]  smc_cdc_tx_handler+0x41/0xc0
[ 4570.712470]  smc_wr_tx_tasklet_fn+0x213/0x560
[ 4570.712981]  ? smc_cdc_tx_dismisser+0x10/0x10
[ 4570.713489]  tasklet_action_common.isra.17+0x66/0x140
[ 4570.714083]  __do_softirq+0x123/0x2f4
[ 4570.714521]  irq_exit_rcu+0xc4/0xf0
[ 4570.714934]  common_interrupt+0xba/0xe0

Though smc_cdc_tx_handler() checked the existence of smc connection,
smc_release() may have already dismissed and released the smc socket
before smc_cdc_tx_handler() further visits it.

smc_cdc_tx_handler()           |smc_release()
if (!conn)                     |
                               |
                               |smc_cdc_tx_dismiss_slots()
                               |      smc_cdc_tx_dismisser()
                               |
                               |sock_put(&smc->sk) <- last sock_put,
                               |                      smc_sock freed
bh_lock_sock(&smc->sk) (panic) |

To make sure we won't receive any CDC messages after we free the
smc_sock, add a refcount on the smc_connection for inflight CDC
message(posted to the QP but haven't received related CQE), and
don't release the smc_connection until all the inflight CDC messages
haven been done, for both success or failed ones.

Using refcount on CDC messages brings another problem: when the link
is going to be destroyed, smcr_link_clear() will reset the QP, which
then remove all the pending CQEs related to the QP in the CQ. To make
sure all the CQEs will always come back so the refcount on the
smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced
by smc_ib_modify_qp_error().
And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we
need to wait for all pending WQEs done, or we may encounter use-after-
free when handling CQEs.

For IB device removal routine, we need to wait for all the QPs on that
device been destroyed before we can destroy CQs on the device, or
the refcount on smc_connection won't reach 0 and smc_sock cannot be
released.

Fixes: 5f08318f617b ("smc: connection data control (CDC)")
Reported-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: don't send CDC/LLC message if link not ready
Dust Li [Tue, 28 Dec 2021 09:03:24 +0000 (17:03 +0800)]
net/smc: don't send CDC/LLC message if link not ready

We found smc_llc_send_link_delete_all() sometimes wait
for 2s timeout when testing with RDMA link up/down.
It is possible when a smc_link is in ACTIVATING state,
the underlaying QP is still in RESET or RTR state, which
cannot send any messages out.

smc_llc_send_link_delete_all() use smc_link_usable() to
checks whether the link is usable, if the QP is still in
RESET or RTR state, but the smc_link is in ACTIVATING, this
LLC message will always fail without any CQE entering the
CQ, and we will always wait 2s before timeout.

Since we cannot send any messages through the QP before
the QP enter RTS. I add a wrapper smc_link_sendable()
which checks the state of QP along with the link state.
And replace smc_link_usable() with smc_link_sendable()
in all LLC & CDC message sending routine.

Fixes: 5f08318f617b ("smc: connection data control (CDC)")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoNFC: st21nfca: Fix memory leak in device probe and remove
Wei Yongjun [Tue, 28 Dec 2021 12:48:11 +0000 (12:48 +0000)]
NFC: st21nfca: Fix memory leak in device probe and remove

'phy->pending_skb' is alloced when device probe, but forgot to free
in the error handling path and remove path, this cause memory leak
as follows:

unreferenced object 0xffff88800bc06800 (size 512):
  comm "8", pid 11775, jiffies 4295159829 (age 9.032s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000d66c09ce>] __kmalloc_node_track_caller+0x1ed/0x450
    [<00000000c93382b3>] kmalloc_reserve+0x37/0xd0
    [<000000005fea522c>] __alloc_skb+0x124/0x380
    [<0000000019f29f9a>] st21nfca_hci_i2c_probe+0x170/0x8f2

Fix it by freeing 'pending_skb' in error and remove.

Fixes: 68957303f44a ("NFC: ST21NFCA: Add driver for STMicroelectronics ST21NFCA NFC Chip")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lantiq_xrx200: fix statistics of received bytes
Aleksander Jan Bajkowski [Mon, 27 Dec 2021 16:22:03 +0000 (17:22 +0100)]
net: lantiq_xrx200: fix statistics of received bytes

Received frames have FCS truncated. There is no need
to subtract FCS length from the statistics.

Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ag71xx: Fix a potential double free in error handling paths
Christophe JAILLET [Sun, 26 Dec 2021 17:51:44 +0000 (18:51 +0100)]
net: ag71xx: Fix a potential double free in error handling paths

'ndev' is a managed resource allocated with devm_alloc_etherdev(), so there
is no need to call free_netdev() explicitly or there will be a double
free().

Simplify all error handling paths accordingly.

Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomISDN: change function names to avoid conflicts
wolfgang huang [Tue, 28 Dec 2021 08:01:20 +0000 (16:01 +0800)]
mISDN: change function names to avoid conflicts

As we build for mips, we meet following error. l1_init error with
multiple definition. Some architecture devices usually marked with
l1, l2, lxx as the start-up phase. so we change the mISDN function
names, align with Isdnl2_xxx.

mips-linux-gnu-ld: drivers/isdn/mISDN/layer1.o: in function `l1_init':
(.text+0x890): multiple definition of `l1_init'; \
arch/mips/kernel/bmips_5xxx_init.o:(.text+0xf0): first defined here
make[1]: *** [home/mips/kernel-build/linux/Makefile:1161: vmlinux] Error 1

Signed-off-by: wolfgang huang <huangjinhui@kylinos.cn>
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodrm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
Evan Quan [Fri, 17 Dec 2021 11:05:06 +0000 (19:05 +0800)]
drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform

By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and
put itself into a state ready for PNP. That can workaround some random resuming
failure observed on BOCO capable platforms.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: always reset the asic in suspend (v2)
Alex Deucher [Fri, 12 Nov 2021 16:25:30 +0000 (11:25 -0500)]
drm/amdgpu: always reset the asic in suspend (v2)

If the platform suspend happens to fail and the power rail
is not turned off, the GPU will be in an unknown state on
resume, so reset the asic so that it will be in a known
good state on resume even if the platform suspend failed.

v2: handle s0ix

Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agoMerge tag 'efi-urgent-for-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 27 Dec 2021 16:58:35 +0000 (08:58 -0800)]
Merge tag 'efi-urgent-for-v5.16-2' of git://git./linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "Another EFI fix for v5.16:

   - Prevent missing prototype warning from breaking the build under
     CONFIG_WERROR=y"

* tag 'efi-urgent-for-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Move efifb_setup_from_dmi() prototype from arch headers

2 years agodrm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
Prike Liang [Mon, 13 Dec 2021 08:17:02 +0000 (16:17 +0800)]
drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume

In the s0ix entry need retain gfx in the gfxoff state,so here need't
set gfx cgpg in the S0ix suspend-resume process. Moreover move the S0ix
check into SMU12 can simplify the code condition check.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1712
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agoselinux: initialize proto variable in selinux_ip_postroute_compat()
Tom Rix [Fri, 24 Dec 2021 15:07:39 +0000 (07:07 -0800)]
selinux: initialize proto variable in selinux_ip_postroute_compat()

Clang static analysis reports this warning

hooks.c:5765:6: warning: 4th function call argument is an uninitialized
                value
        if (selinux_xfrm_postroute_last(sksec->sid, skb, &ad, proto))
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

selinux_parse_skb() can return ok without setting proto.  The later call
to selinux_xfrm_postroute_last() does an early check of proto and can
return ok if the garbage proto value matches.  So initialize proto.

Cc: stable@vger.kernel.org
Fixes: eef9b41622f2 ("selinux: cleanup selinux_xfrm_sock_rcv_skb() and selinux_xfrm_postroute_last()")
Signed-off-by: Tom Rix <trix@redhat.com>
[PM: typo/spelling and checkpatch.pl description fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2 years agonfc: uapi: use kernel size_t to fix user-space builds
Krzysztof Kozlowski [Sun, 26 Dec 2021 12:03:47 +0000 (13:03 +0100)]
nfc: uapi: use kernel size_t to fix user-space builds

Fix user-space builds if it includes /usr/include/linux/nfc.h before
some of other headers:

  /usr/include/linux/nfc.h:281:9: error: unknown type name â€˜size_t’
    281 |         size_t service_name_len;
        |         ^~~~~~

Fixes: d646960f7986 ("NFC: Initial LLCP support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agouapi: fix linux/nfc.h userspace compilation errors
Dmitry V. Levin [Sun, 26 Dec 2021 13:01:27 +0000 (16:01 +0300)]
uapi: fix linux/nfc.h userspace compilation errors

Replace sa_family_t with __kernel_sa_family_t to fix the following
linux/nfc.h userspace compilation errors:

/usr/include/linux/nfc.h:266:2: error: unknown type name 'sa_family_t'
  sa_family_t sa_family;
/usr/include/linux/nfc.h:274:2: error: unknown type name 'sa_family_t'
  sa_family_t sa_family;

Fixes: 23b7869c0fd0 ("NFC: add the NFC socket raw protocol")
Fixes: d646960f7986 ("NFC: Initial LLCP support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: usb: pegasus: Do not drop long Ethernet frames
Matthias-Christian Ott [Sun, 26 Dec 2021 22:12:08 +0000 (23:12 +0100)]
net: usb: pegasus: Do not drop long Ethernet frames

The D-Link DSB-650TX (2001:4002) is unable to receive Ethernet frames
that are longer than 1518 octets, for example, Ethernet frames that
contain 802.1Q VLAN tags.

The frames are sent to the pegasus driver via USB but the driver
discards them because they have the Long_pkt field set to 1 in the
received status report. The function read_bulk_callback of the pegasus
driver treats such received "packets" (in the terminology of the
hardware) as errors but the field simply does just indicate that the
Ethernet frame (MAC destination to FCS) is longer than 1518 octets.

It seems that in the 1990s there was a distinction between
"giant" (> 1518) and "runt" (< 64) frames and the hardware includes
flags to indicate this distinction. It seems that the purpose of the
distinction "giant" frames was to not allow infinitely long frames due
to transmission errors and to allow hardware to have an upper limit of
the frame size. However, the hardware already has such limit with its
2048 octet receive buffer and, therefore, Long_pkt is merely a
convention and should not be treated as a receive error.

Actually, the hardware is even able to receive Ethernet frames with 2048
octets which exceeds the claimed limit frame size limit of the driver of
1536 octets (PEGASUS_MTU).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Matthias-Christian Ott <ott@mirix.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoatlantic: Fix buff_ring OOB in aq_ring_rx_clean
Zekun Shen [Mon, 27 Dec 2021 02:32:45 +0000 (21:32 -0500)]
atlantic: Fix buff_ring OOB in aq_ring_rx_clean

The function obtain the next buffer without boundary check.
We should return with I/O error code.

The bug is found by fuzzing and the crash report is attached.
It is an OOB bug although reported as use-after-free.

[    4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9
[    4.806505]
[    4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W         5.6.0 #34
[    4.809030] Call Trace:
[    4.809343]  dump_stack+0x76/0xa0
[    4.809755]  print_address_description.constprop.0+0x16/0x200
[    4.810455]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.811234]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.813183]  __kasan_report.cold+0x37/0x7c
[    4.813715]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.814393]  kasan_report+0xe/0x20
[    4.814837]  aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.815499]  ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic]
[    4.816290]  aq_vec_poll+0x179/0x5d0 [atlantic]
[    4.816870]  ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic]
[    4.817746]  ? __next_timer_interrupt+0xba/0xf0
[    4.818322]  net_rx_action+0x363/0xbd0
[    4.818803]  ? call_timer_fn+0x240/0x240
[    4.819302]  ? __switch_to_asm+0x40/0x70
[    4.819809]  ? napi_busy_loop+0x520/0x520
[    4.820324]  __do_softirq+0x18c/0x634
[    4.820797]  ? takeover_tasklets+0x5f0/0x5f0
[    4.821343]  run_ksoftirqd+0x15/0x20
[    4.821804]  smpboot_thread_fn+0x2f1/0x6b0
[    4.822331]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.823041]  ? __kthread_parkme+0x80/0x100
[    4.823571]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.824301]  kthread+0x2b5/0x3b0
[    4.824723]  ? kthread_create_on_node+0xd0/0xd0
[    4.825304]  ret_from_fork+0x35/0x40

Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: udp: fix alignment problem in udp4_seq_show()
yangxingwu [Mon, 27 Dec 2021 08:29:51 +0000 (16:29 +0800)]
net: udp: fix alignment problem in udp4_seq_show()

$ cat /pro/net/udp

before:

  sl  local_address rem_address   st tx_queue rx_queue tr tm->when
26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000

after:

   sl  local_address rem_address   st tx_queue rx_queue tr tm->when
26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: fix using of uninitialized completions
Karsten Graul [Mon, 27 Dec 2021 13:35:30 +0000 (14:35 +0100)]
net/smc: fix using of uninitialized completions

In smc_wr_tx_send_wait() the completion on index specified by
pend->idx is initialized and after smc_wr_tx_send() was called the wait
for completion starts. pend->idx is used to get the correct index for
the wait, but the pend structure could already be cleared in
smc_wr_tx_process_cqe().
Introduce pnd_idx to hold and use a local copy of the correct index.

Fixes: 09c61d24f96d ("net/smc: wait for departure of an IB message")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate
William Zhao [Thu, 23 Dec 2021 17:33:16 +0000 (12:33 -0500)]
ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate

The "__ip6_tnl_parm" struct was left uninitialized causing an invalid
load of random data when the "__ip6_tnl_parm" struct was used elsewhere.
As an example, in the function "ip6_tnl_xmit_ctl()", it tries to access
the "collect_md" member. With "__ip6_tnl_parm" being uninitialized and
containing random data, the UBSAN detected that "collect_md" held a
non-boolean value.

The UBSAN issue is as follows:
===============================================================
UBSAN: invalid-load in net/ipv6/ip6_tunnel.c:1025:14
load of value 30 is not a valid value for type '_Bool'
CPU: 1 PID: 228 Comm: kworker/1:3 Not tainted 5.16.0-rc4+ #8
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
<TASK>
dump_stack_lvl+0x44/0x57
ubsan_epilogue+0x5/0x40
__ubsan_handle_load_invalid_value+0x66/0x70
? __cpuhp_setup_state+0x1d3/0x210
ip6_tnl_xmit_ctl.cold.52+0x2c/0x6f [ip6_tunnel]
vti6_tnl_xmit+0x79c/0x1e96 [ip6_vti]
? lock_is_held_type+0xd9/0x130
? vti6_rcv+0x100/0x100 [ip6_vti]
? lock_is_held_type+0xd9/0x130
? rcu_read_lock_bh_held+0xc0/0xc0
? lock_acquired+0x262/0xb10
dev_hard_start_xmit+0x1e6/0x820
__dev_queue_xmit+0x2079/0x3340
? mark_lock.part.52+0xf7/0x1050
? netdev_core_pick_tx+0x290/0x290
? kvm_clock_read+0x14/0x30
? kvm_sched_clock_read+0x5/0x10
? sched_clock_cpu+0x15/0x200
? find_held_lock+0x3a/0x1c0
? lock_release+0x42f/0xc90
? lock_downgrade+0x6b0/0x6b0
? mark_held_locks+0xb7/0x120
? neigh_connected_output+0x31f/0x470
? lockdep_hardirqs_on+0x79/0x100
? neigh_connected_output+0x31f/0x470
? ip6_finish_output2+0x9b0/0x1d90
? rcu_read_lock_bh_held+0x62/0xc0
? ip6_finish_output2+0x9b0/0x1d90
ip6_finish_output2+0x9b0/0x1d90
? ip6_append_data+0x330/0x330
? ip6_mtu+0x166/0x370
? __ip6_finish_output+0x1ad/0xfb0
? nf_hook_slow+0xa6/0x170
ip6_output+0x1fb/0x710
? nf_hook.constprop.32+0x317/0x430
? ip6_finish_output+0x180/0x180
? __ip6_finish_output+0xfb0/0xfb0
? lock_is_held_type+0xd9/0x130
ndisc_send_skb+0xb33/0x1590
? __sk_mem_raise_allocated+0x11cf/0x1560
? dst_output+0x4a0/0x4a0
? ndisc_send_rs+0x432/0x610
addrconf_dad_completed+0x30c/0xbb0
? addrconf_rs_timer+0x650/0x650
? addrconf_dad_work+0x73c/0x10e0
addrconf_dad_work+0x73c/0x10e0
? addrconf_dad_completed+0xbb0/0xbb0
? rcu_read_lock_sched_held+0xaf/0xe0
? rcu_read_lock_bh_held+0xc0/0xc0
process_one_work+0x97b/0x1740
? pwq_dec_nr_in_flight+0x270/0x270
worker_thread+0x87/0xbf0
? process_one_work+0x1740/0x1740
kthread+0x3ac/0x490
? set_kthread_struct+0x100/0x100
ret_from_fork+0x22/0x30
</TASK>
===============================================================

The solution is to initialize "__ip6_tnl_parm" struct to zeros in the
"vti6_siocdevprivate()" function.

Signed-off-by: William Zhao <wizhao@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodrm/i915: Increment composite fence seqno
Matthew Brost [Tue, 14 Dec 2021 19:59:13 +0000 (11:59 -0800)]
drm/i915: Increment composite fence seqno

Increment composite fence seqno on each fence creation.

Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214195913.35735-1-matthew.brost@intel.com
(cherry picked from commit 62eeb9ae1364cd96991ccc6e3c5c69d66b8c64df)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2 years agodrm/i915: Fix possible uninitialized variable in parallel extension
Matthew Brost [Sun, 19 Dec 2021 00:19:09 +0000 (16:19 -0800)]
drm/i915: Fix possible uninitialized variable in parallel extension

'prev_engine' was declared inside the output loop and checked in the
inner after at least 1 pass of either loop. The variable should be
declared outside both loops as it needs to be persistent across the
entire loop structure.

Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211219001909.24348-1-matthew.brost@intel.com
(cherry picked from commit cbffbac9c14220b8716b0a9c29d72243f6b14ef3)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2 years agoLinux 5.16-rc7
Linus Torvalds [Sun, 26 Dec 2021 21:17:17 +0000 (13:17 -0800)]
Linux 5.16-rc7

2 years agoMerge tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Dec 2021 18:28:55 +0000 (10:28 -0800)]
Merge tag 'x86_urgent_for_v5.16_rc7' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Prevent potential undefined behavior due to shifting pkey constants
   into the sign bit

 - Move the EFI memory reservation code *after* the efi= cmdline parsing
   has happened

 - Revert two commits which turned out to be the wrong direction to
   chase when accommodating early memblock reservations consolidation
   and command line parameters parsing

* tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pkey: Fix undefined behaviour with PKRU_WD_BIT
  x86/boot: Move EFI range reservation after cmdline parsing
  Revert "x86/boot: Pull up cmdline preparation and early param parsing"
  Revert "x86/boot: Mark prepare_command_line() __init"

2 years agoMerge tag 'objtool_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 Dec 2021 18:19:40 +0000 (10:19 -0800)]
Merge tag 'objtool_urgent_for_v5.16_rc7' of git://git./linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

 - Prevent clang from reordering the reachable annotation in
   an inline asm statement without inputs

 - Fix objtool builds on non-glibc systems due to undefined
   __always_inline

* tag 'objtool_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compiler.h: Fix annotation macro misplacement with Clang
  uapi: Fix undefined __always_inline on non-glibc systems

2 years agoMerge tag 'pinctrl-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sun, 26 Dec 2021 04:00:09 +0000 (20:00 -0800)]
Merge tag 'pinctrl-v5.16-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Some hopefully final pin control fixes for the v5.16 kernel:

   - Fix an out-of-bounds bug in the Mediatek driver

   - Fix an init order bug in the Broadcom BCM2835 driver

   - Fix a GPIO offset bug in the STM32 driver"

* tag 'pinctrl-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: stm32: consider the GPIO offset to expose all the GPIO lines
  pinctrl: bcm2835: Change init order for gpio hogs
  pinctrl: mediatek: fix global-out-of-bounds issue

2 years agoMerge tag 'hwmon-for-v5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 Dec 2021 21:08:22 +0000 (13:08 -0800)]
Merge tag 'hwmon-for-v5.16-rc7' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "A couple of lm90 driver fixes. None of them are critical, but they
  should nevertheless be fixed"

* tag 'hwmon-for-v5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (lm90) Do not report 'busy' status bit as alarm
  hwmom: (lm90) Fix citical alarm status for MAX6680/MAX6681
  hwmon: (lm90) Drop critical attribute support for MAX6654
  hwmon: (lm90) Prevent integer overflow/underflow in hysteresis calculations
  hwmon: (lm90) Fix usage of CONFIG2 register in detect function

2 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sat, 25 Dec 2021 21:00:14 +0000 (13:00 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
 "A few small updates to drivers.

  Of note we are now deferring probes of i8042 on some Asus devices as
  the controller is not ready to respond to queries first time around
  when the driver is compiled into the kernel"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elants_i2c - do not check Remark ID on eKTH3900/eKTH5312
  Input: atmel_mxt_ts - fix double free in mxt_read_info_block
  Input: goodix - fix memory leak in goodix_firmware_upload
  Input: goodix - add id->model mapping for the "9111" model
  Input: goodix - try not to touch the reset-pin on x86/ACPI devices
  Input: i8042 - enable deferred probe quirk for ASUS UM325UA
  Input: elantech - fix stack out of bound access in elantech_change_report_id()
  Input: iqs626a - prohibit inlining of channel parsing functions
  Input: i8042 - add deferred probe support

2 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 25 Dec 2021 20:30:03 +0000 (12:30 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "9 patches.

  Subsystems affected by this patch series: mm (kfence, mempolicy,
  memory-failure, pagemap, pagealloc, damon, and memory-failure),
  core-kernel, and MAINTAINERS"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
  mm/damon/dbgfs: protect targets destructions with kdamond_lock
  mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
  mm: delete unsafe BUG from page_cache_add_speculative()
  mm, hwpoison: fix condition in free hugetlb page path
  MAINTAINERS: mark more list instances as moderated
  kernel/crash_core: suppress unknown crashkernel parameter warning
  mm: mempolicy: fix THP allocations escaping mempolicy restrictions
  kfence: fix memory leak when cat kfence objects

2 years agomm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
Liu Shixin [Sat, 25 Dec 2021 05:12:58 +0000 (21:12 -0800)]
mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()

Hulk Robot reported a panic in put_page_testzero() when testing
madvise() with MADV_SOFT_OFFLINE.  The BUG() is triggered when retrying
get_any_page().  This is because we keep MF_COUNT_INCREASED flag in
second try but the refcnt is not increased.

    page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
    ------------[ cut here ]------------
    kernel BUG at include/linux/mm.h:737!
    invalid opcode: 0000 [#1] PREEMPT SMP
    CPU: 5 PID: 2135 Comm: sshd Tainted: G    B             5.16.0-rc6-dirty #373
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    RIP: release_pages+0x53f/0x840
    Call Trace:
      free_pages_and_swap_cache+0x64/0x80
      tlb_flush_mmu+0x6f/0x220
      unmap_page_range+0xe6c/0x12c0
      unmap_single_vma+0x90/0x170
      unmap_vmas+0xc4/0x180
      exit_mmap+0xde/0x3a0
      mmput+0xa3/0x250
      do_exit+0x564/0x1470
      do_group_exit+0x3b/0x100
      __do_sys_exit_group+0x13/0x20
      __x64_sys_exit_group+0x16/0x20
      do_syscall_64+0x34/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae
    Modules linked in:
    ---[ end trace e99579b570fe0649 ]---
    RIP: 0010:release_pages+0x53f/0x840

Link: https://lkml.kernel.org/r/20211221074908.3910286-1-liushixin2@huawei.com
Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm/damon/dbgfs: protect targets destructions with kdamond_lock
SeongJae Park [Sat, 25 Dec 2021 05:12:54 +0000 (21:12 -0800)]
mm/damon/dbgfs: protect targets destructions with kdamond_lock

DAMON debugfs interface iterates current monitoring targets in
'dbgfs_target_ids_read()' while holding the corresponding
'kdamond_lock'.  However, it also destructs the monitoring targets in
'dbgfs_before_terminate()' without holding the lock.  This can result in
a use_after_free bug.  This commit avoids the race by protecting the
destruction with the corresponding 'kdamond_lock'.

Link: https://lkml.kernel.org/r/20211221094447.2241-1-sj@kernel.org
Reported-by: Sangwoo Bae <sangwoob@amazon.com>
Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [5.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
Thibaut Sautereau [Sat, 25 Dec 2021 05:12:51 +0000 (21:12 -0800)]
mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid

The second parameter of alloc_pages_exact_nid is the one indicating the
size of memory pointed by the returned pointer.

Link: https://lkml.kernel.org/r/YbjEgwhn4bGblp//@coeus
Fixes: abd58f38dfb4 ("mm/page_alloc: add __alloc_size attributes for better bounds checking")
Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Levente Polyak <levente@leventepolyak.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm: delete unsafe BUG from page_cache_add_speculative()
Hugh Dickins [Sat, 25 Dec 2021 05:12:48 +0000 (21:12 -0800)]
mm: delete unsafe BUG from page_cache_add_speculative()

It is not easily reproducible, but on 5.16-rc I have several times hit
the VM_BUG_ON_PAGE(PageTail(page), page) in
page_cache_add_speculative(): usually from filemap_get_read_batch() for
an ext4 read, yesterday from next_uptodate_page() from
filemap_map_pages() for a shmem fault.

That BUG used to be placed where page_ref_add_unless() had succeeded,
but now it is placed before folio_ref_add_unless() is attempted: that is
not safe, since it is only the acquired reference which makes the page
safe from racing THP collapse or split.

We could keep the BUG, checking PageTail only when
folio_ref_try_add_rcu() has succeeded; but I don't think it adds much
value - just delete it.

Link: https://lkml.kernel.org/r/8b98fc6f-3439-8614-c3f3-945c659a1aba@google.com
Fixes: 020853b6f5ea ("mm: Add folio_try_get_rcu()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm, hwpoison: fix condition in free hugetlb page path
Naoya Horiguchi [Sat, 25 Dec 2021 05:12:45 +0000 (21:12 -0800)]
mm, hwpoison: fix condition in free hugetlb page path

When a memory error hits a tail page of a free hugepage,
__page_handle_poison() is expected to be called to isolate the error in
4kB unit, but it's not called due to the outdated if-condition in
memory_failure_hugetlb().  This loses the chance to isolate the error in
the finer unit, so it's not optimal.  Drop the condition.

This "(p != head && TestSetPageHWPoison(head)" condition is based on the
old semantics of PageHWPoison on hugepage (where PG_hwpoison flag was
set on the subpage), so it's not necessray any more.  By getting to set
PG_hwpoison on head page for hugepages, concurrent error events on
different subpages in a single hugepage can be prevented by
TestSetPageHWPoison(head) at the beginning of memory_failure_hugetlb().
So dropping the condition should not reopen the race window originally
mentioned in commit b985194c8c0a ("hwpoison, hugetlb:
lock_page/unlock_page does not match for handling a free hugepage")

[naoya.horiguchi@linux.dev: fix "HardwareCorrupted" counter]
Link: https://lkml.kernel.org/r/20211220084851.GA1460264@u2004
Link: https://lkml.kernel.org/r/20211210110208.879740-1-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Fei Luo <luofei@unicloud.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org> [5.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMAINTAINERS: mark more list instances as moderated
Randy Dunlap [Sat, 25 Dec 2021 05:12:42 +0000 (21:12 -0800)]
MAINTAINERS: mark more list instances as moderated

Some lists that are moderated are not marked as moderated consistently,
so mark them all as moderated.

Link: https://lkml.kernel.org/r/20211209001330.18558-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Conor Culhane <conor.culhane@silvaco.com>
Cc: Ryder Lee <ryder.lee@mediatek.com>
Cc: Jianjun Wang <jianjun.wang@mediatek.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agokernel/crash_core: suppress unknown crashkernel parameter warning
Philipp Rudo [Sat, 25 Dec 2021 05:12:39 +0000 (21:12 -0800)]
kernel/crash_core: suppress unknown crashkernel parameter warning

When booting with crashkernel= on the kernel command line a warning
similar to

    Kernel command line: ro console=ttyS0 crashkernel=256M
    Unknown kernel command line parameters "crashkernel=256M", will be passed to user space.

is printed.

This comes from crashkernel= being parsed independent from the kernel
parameter handling mechanism.  So the code in init/main.c doesn't know
that crashkernel= is a valid kernel parameter and prints this incorrect
warning.

Suppress the warning by adding a dummy early_param handler for
crashkernel=.

Link: https://lkml.kernel.org/r/20211208133443.6867-1-prudo@redhat.com
Fixes: 86d1919a4fb0 ("init: print out unknown kernel parameters")
Signed-off-by: Philipp Rudo <prudo@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agomm: mempolicy: fix THP allocations escaping mempolicy restrictions
Andrey Ryabinin [Sat, 25 Dec 2021 05:12:35 +0000 (21:12 -0800)]
mm: mempolicy: fix THP allocations escaping mempolicy restrictions

alloc_pages_vma() may try to allocate THP page on the local NUMA node
first:

page = __alloc_pages_node(hpage_node,
gfp | __GFP_THISNODE | __GFP_NORETRY, order);

And if the allocation fails it retries allowing remote memory:

if (!page && (gfp & __GFP_DIRECT_RECLAIM))
     page = __alloc_pages_node(hpage_node,
gfp, order);

However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.

The first appearance of this bug seems to be the commit ac5b2c18911f
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").

The bug disappeared later in the commit 89c83fb539f9 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
reappeared again in slightly different form in the commit 76e654cc91bb
("mm, page_alloc: allow hugepage fallback to remote nodes when
madvised")

Fix this by passing correct nodemask to the __alloc_pages() call.

The demonstration/reproducer of the problem:

    $ mount -oremount,size=4G,huge=always /dev/shm/
    $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
    $ cat mbind_thp.c
    #include <unistd.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <assert.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <numaif.h>

    #define SIZE 2ULL << 30
    int main(int argc, char **argv)
    {
        int fd;
        unsigned long long i;
        char *addr;
        pid_t pid;
        char buf[100];
        unsigned long nodemask = 1;

        fd = open("/dev/shm/test", O_RDWR|O_CREAT);
        assert(fd > 0);
        assert(ftruncate(fd, SIZE) == 0);

        addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                           MAP_SHARED, fd, 0);

        assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
        for (i = 0; i < SIZE; i+=4096) {
          addr[i] = 1;
        }
        pid = getpid();
        snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
        system(buf);
        sleep(10000);

        return 0;
    }
    $ gcc mbind_thp.c -o mbind_thp -lnuma
    $ numactl -H
    available: 2 nodes (0-1)
    node 0 cpus: 0 2
    node 0 size: 1918 MB
    node 0 free: 1595 MB
    node 1 cpus: 1 3
    node 1 size: 2014 MB
    node 1 free: 1731 MB
    node distances:
    node   0   1
      0:  10  20
      1:  20  10
    $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
    7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4

Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agokfence: fix memory leak when cat kfence objects
Baokun Li [Sat, 25 Dec 2021 05:12:32 +0000 (21:12 -0800)]
kfence: fix memory leak when cat kfence objects

Hulk robot reported a kmemleak problem:

    unreferenced object 0xffff93d1d8cc02e8 (size 248):
      comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
      hex dump (first 32 bytes):
        00 40 85 19 d4 93 ff ff 00 10 00 00 00 00 00 00  .@..............
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
         seq_open+0x2a/0x80
         full_proxy_open+0x167/0x1e0
         do_dentry_open+0x1e1/0x3a0
         path_openat+0x961/0xa20
         do_filp_open+0xae/0x120
         do_sys_openat2+0x216/0x2f0
         do_sys_open+0x57/0x80
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
    unreferenced object 0xffff93d419854000 (size 4096):
      comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
      hex dump (first 32 bytes):
        6b 66 65 6e 63 65 2d 23 32 35 30 3a 20 30 78 30  kfence-#250: 0x0
        30 30 30 30 30 30 30 37 35 34 62 64 61 31 32 2d  0000000754bda12-
      backtrace:
         seq_read_iter+0x313/0x440
         seq_read+0x14b/0x1a0
         full_proxy_read+0x56/0x80
         vfs_read+0xa5/0x1b0
         ksys_read+0xa0/0xf0
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9

I find that we can easily reproduce this problem with the following
commands:

cat /sys/kernel/debug/kfence/objects
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak

The leaked memory is allocated in the stack below:

    do_syscall_64
      do_sys_open
        do_dentry_open
          full_proxy_open
            seq_open            ---> alloc seq_file
      vfs_read
        full_proxy_read
          seq_read
            seq_read_iter
              traverse          ---> alloc seq_buf

And it should have been released in the following process:

    do_syscall_64
      syscall_exit_to_user_mode
        exit_to_user_mode_prepare
          task_work_run
            ____fput
              __fput
                full_proxy_release  ---> free here

However, the release function corresponding to file_operations is not
implemented in kfence.  As a result, a memory leak occurs.  Therefore,
the solution to this problem is to implement the corresponding release
function.

Link: https://lkml.kernel.org/r/20211206133628.2822545-1-libaokun1@huawei.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoselftests: mptcp: Remove the deprecated config NFT_COUNTER
Ma Xinjian [Fri, 24 Dec 2021 09:59:28 +0000 (17:59 +0800)]
selftests: mptcp: Remove the deprecated config NFT_COUNTER

NFT_COUNTER was removed since
390ad4295aa ("netfilter: nf_tables: make counter support built-in")
LKP/0Day will check if all configs listing under selftests are able to
be enabled properly.

For the missing configs, it will report something like:
LKP WARN miss config CONFIG_NFT_COUNTER= of net/mptcp/config

- it's not reasonable to keep the deprecated configs.
- configs under kselftests are recommended by corresponding tests.
So if some configs are missing, it will impact the testing results

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ma Xinjian <xinjianx.ma@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosctp: use call_rcu to free endpoint
Xin Long [Thu, 23 Dec 2021 18:04:30 +0000 (13:04 -0500)]
sctp: use call_rcu to free endpoint

This patch is to delay the endpoint free by calling call_rcu() to fix
another use-after-free issue in sctp_sock_dump():

  BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
  Call Trace:
    __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
    lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
    _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
    spin_lock_bh include/linux/spinlock.h:334 [inline]
    __lock_sock+0x203/0x350 net/core/sock.c:2253
    lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
    lock_sock include/net/sock.h:1492 [inline]
    sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
    sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091
    sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527
    __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049
    inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065
    netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
    __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
    netlink_dump_start include/linux/netlink.h:216 [inline]
    inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170
    __sock_diag_cmd net/core/sock_diag.c:232 [inline]
    sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
    netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
    sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274

This issue occurs when asoc is peeled off and the old sk is freed after
getting it by asoc->base.sk and before calling lock_sock(sk).

To prevent the sk free, as a holder of the sk, ep should be alive when
calling lock_sock(). This patch uses call_rcu() and moves sock_put and
ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to
hold the ep under rcu_read_lock in sctp_transport_traverse_process().

If sctp_endpoint_hold() returns true, it means this ep is still alive
and we have held it and can continue to dump it; If it returns false,
it means this ep is dead and can be freed after rcu_read_unlock, and
we should skip it.

In sctp_sock_dump(), after locking the sk, if this ep is different from
tsp->asoc->ep, it means during this dumping, this asoc was peeled off
before calling lock_sock(), and the sk should be skipped; If this ep is
the same with tsp->asoc->ep, it means no peeloff happens on this asoc,
and due to lock_sock, no peeloff will happen either until release_sock.

Note that delaying endpoint free won't delay the port release, as the
port release happens in sctp_endpoint_destroy() before calling call_rcu().
Also, freeing endpoint by call_rcu() makes it safe to access the sk by
asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv().

Thanks Jones to bring this issue up.

v1->v2:
  - improve the changelog.
  - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.

Reported-by: syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com
Reported-by: Lee Jones <lee.jones@linaro.org>
Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register
Miaoqian Lin [Fri, 24 Dec 2021 02:14:59 +0000 (02:14 +0000)]
net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register

The fixed_phy_get_gpiod function() returns NULL, it doesn't return error
pointers, using NULL checking to fix this.i

Fixes: 5468e82f7034 ("net: phy: fixed-phy: Drop GPIO from fixed_phy_add()")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20211224021500.10362-1-linmq006@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Fri, 24 Dec 2021 17:02:24 +0000 (09:02 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

 - fix nommu after getting rid of mini-stack for ARMv7

 - fix Thumb2 bug in iWMMXt exception handling

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9169/1: entry: fix Thumb2 bug in iWMMXt exception handling
  ARM: 9160/1: NOMMU: Reload __secondary_data after PROCINFO_INITFUNC

2 years agoMerge tag 'platform-drivers-x86-v5.16-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 24 Dec 2021 16:58:23 +0000 (08:58 -0800)]
Merge tag 'platform-drivers-x86-v5.16-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "Various bug-fixes"

* tag 'platform-drivers-x86-v5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: intel_pmc_core: fix memleak on registration failure
  platform/x86/intel: Remove X86_PLATFORM_DRIVERS_INTEL
  platform/x86: system76_acpi: Guard System76 EC specific functionality
  platform/x86: apple-gmux: use resource_size() with res
  platform/x86: amd-pmc: only use callbacks for suspend
  platform/mellanox: mlxbf-pmc: Fix an IS_ERR() vs NULL bug in mlxbf_pmc_map_counters

2 years agorecordmcount.pl: fix typo in s390 mcount regex
Heiko Carstens [Thu, 23 Dec 2021 16:43:14 +0000 (17:43 +0100)]
recordmcount.pl: fix typo in s390 mcount regex

Commit 85bf17b28f97 ("recordmcount.pl: look for jgnop instruction as well
as bcrl on s390") added a new alternative mnemonic for the existing brcl
instruction. This is required for the combination old gcc version (pre 9.0)
and binutils since version 2.37.
However at the same time this commit introduced a typo, replacing brcl with
bcrl. As a result no mcount locations are detected anymore with old gcc
versions (pre 9.0) and binutils before version 2.37.
Fix this by using the correct mnemonic again.

Reported-by: Miroslav Benes <mbenes@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: 85bf17b28f97 ("recordmcount.pl: look for jgnop instruction as well as bcrl on s390")
Link: https://lore.kernel.org/r/alpine.LSU.2.21.2112230949520.19849@pobox.suse.cz
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2 years agoselftests: Calculate udpgso segment count without header adjustment
Coco Li [Thu, 23 Dec 2021 22:24:41 +0000 (22:24 +0000)]
selftests: Calculate udpgso segment count without header adjustment

The below referenced commit correctly updated the computation of number
of segments (gso_size) by using only the gso payload size and
removing the header lengths.

With this change the regression test started failing. Update
the tests to match this new behavior.

Both IPv4 and IPv6 tests are updated, as a separate patch in this series
will update udp_v6_send_skb to match this change in udp_send_skb.

Fixes: 158390e45612 ("udp: using datalen to cap max gso segments")
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20211223222441.2975883-2-lixiaoyan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoudp: using datalen to cap ipv6 udp max gso segments
Coco Li [Thu, 23 Dec 2021 22:24:40 +0000 (22:24 +0000)]
udp: using datalen to cap ipv6 udp max gso segments

The max number of UDP gso segments is intended to cap to
UDP_MAX_SEGMENTS, this is checked in udp_send_skb().

skb->len contains network and transport header len here, we should use
only data len instead.

This is the ipv6 counterpart to the below referenced commit,
which missed the ipv6 change

Fixes: 158390e45612 ("udp: using datalen to cap max gso segments")
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20211223222441.2975883-1-lixiaoyan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'mlx5-fixes-2021-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Fri, 24 Dec 2021 03:04:32 +0000 (19:04 -0800)]
Merge tag 'mlx5-fixes-2021-12-22' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-12-22

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2021-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'
  net/mlx5e: Delete forward rule for ct or sample action
  net/mlx5e: Fix ICOSQ recovery flow for XSK
  net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow
  net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled
  net/mlx5e: Wrap the tx reporter dump callback to extract the sq
  net/mlx5: Fix tc max supported prio for nic mode
  net/mlx5: Fix SF health recovery flow
  net/mlx5: Fix error print in case of IRQ request failed
  net/mlx5: Use first online CPU instead of hard coded CPU
  net/mlx5: DR, Fix querying eswitch manager vport for ECPF
  net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
====================

Link: https://lore.kernel.org/r/20211223190441.153012-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag '5.16-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Fri, 24 Dec 2021 01:15:23 +0000 (17:15 -0800)]
Merge tag '5.16-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd fixes from Steve French:
 "Three ksmbd fixes, all for stable as well.

  Two fix potential unitialized memory and one fixes a security problem
  where encryption is unitentionally disabled from some clients"

* tag '5.16-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: disable SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1
  ksmbd: fix uninitialized symbol 'pntsd_size'
  ksmbd: fix error code in ndr_read_int32()

2 years agoMerge tag 'drm-fixes-2021-12-24' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Thu, 23 Dec 2021 23:43:25 +0000 (15:43 -0800)]
Merge tag 'drm-fixes-2021-12-24' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Happy Xmas. Nothing major, one mediatek and a couple of i915 locking
  fixes. There might be a few stragglers over next week or so but I
  don't expect much before next release.

  mediatek:
   - NULL pointer check

  i915:
   - guc submission locking fixes"

* tag 'drm-fixes-2021-12-24' of git://anongit.freedesktop.org/drm/drm:
  drm/i915/guc: Only assign guc_id.id when stealing guc_id
  drm/i915/guc: Use correct context lock when callig clr_context_registered
  drm/mediatek: hdmi: Perform NULL pointer check for mtk_hdmi_conf

2 years agoMerge tag 'io_uring-5.16-2021-12-23' of git://git.kernel.dk/linux-block
Linus Torvalds [Thu, 23 Dec 2021 23:32:07 +0000 (15:32 -0800)]
Merge tag 'io_uring-5.16-2021-12-23' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Single fix for not clearing kiocb->ki_pos back to 0 for a stream,
  destined for stable as well"

* tag 'io_uring-5.16-2021-12-23' of git://git.kernel.dk/linux-block:
  io_uring: zero iocb->ki_pos for stream file types

2 years agoMerge branch 'ucount-rlimit-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 23 Dec 2021 23:27:02 +0000 (15:27 -0800)]
Merge branch 'ucount-rlimit-fixes-for-v5.16' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull ucount fix from Eric Biederman:
 "This fixes a silly logic bug in the ucount rlimits code, where it was
  comparing against the wrong limit"

* 'ucount-rlimit-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Fix rlimit max values check

2 years agoMerge tag 'net-5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 23 Dec 2021 18:45:55 +0000 (10:45 -0800)]
Merge tag 'net-5.16-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter.

  Current release - regressions:

   - revert "tipc: use consistent GFP flags"

  Previous releases - regressions:

   - igb: fix deadlock caused by taking RTNL in runtime resume path

   - accept UFOv6 packages in virtio_net_hdr_to_skb

   - netfilter: fix regression in looped (broad|multi)cast's MAC
     handling

   - bridge: fix ioctl old_deviceless bridge argument

   - ice: xsk: do not clear status_error0 for ntu + nb_buffs descriptor,
     avoid stalls when multiple sockets use an interface

  Previous releases - always broken:

   - inet: fully convert sk->sk_rx_dst to RCU rules

   - veth: ensure skb entering GRO are not cloned

   - sched: fix zone matching for invalid conntrack state

   - bonding: fix ad_actor_system option setting to default

   - nf_tables: fix use-after-free in nft_set_catchall_destroy()

   - lantiq_xrx200: increase buffer reservation to avoid mem corruption

   - ice: xsk: avoid leaking app buffers during clean up

   - tun: avoid double free in tun_free_netdev"

* tag 'net-5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
  net: stmmac: dwmac-visconti: Fix value of ETHER_CLK_SEL_FREQ_SEL_2P5M
  r8152: sync ocp base
  r8152: fix the force speed doesn't work for RTL8156
  net: bridge: fix ioctl old_deviceless bridge argument
  net: stmmac: ptp: fix potentially overflowing expression
  net: dsa: tag_ocelot: use traffic class to map priority on injected header
  veth: ensure skb entering GRO are not cloned.
  asix: fix wrong return value in asix_check_host_enable()
  asix: fix uninit-value in asix_mdio_read()
  sfc: falcon: Check null pointer of rx_queue->page_ring
  sfc: Check null pointer of rx_queue->page_ring
  net: ks8851: Check for error irq
  drivers: net: smc911x: Check for error irq
  fjes: Check for error irq
  bonding: fix ad_actor_system option setting to default
  igb: fix deadlock caused by taking RTNL in RPM resume path
  gve: Correct order of processing device options
  net: skip virtio_net_hdr_set_proto if protocol already set
  net: accept UFOv6 packages in virtio_net_hdr_to_skb
  docs: networking: replace skb_hwtstamp_tx with skb_tstamp_tx
  ...

2 years agoplatform/x86: intel_pmc_core: fix memleak on registration failure
Johan Hovold [Wed, 22 Dec 2021 10:50:23 +0000 (11:50 +0100)]
platform/x86: intel_pmc_core: fix memleak on registration failure

In case device registration fails during module initialisation, the
platform device structure needs to be freed using platform_device_put()
to properly free all resources (e.g. the device name).

Fixes: 938835aa903a ("platform/x86: intel_pmc_core: do not create a static struct device")
Cc: stable@vger.kernel.org # 5.9
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20211222105023.6205-1-johan@kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2 years agonet: stmmac: dwmac-visconti: Fix value of ETHER_CLK_SEL_FREQ_SEL_2P5M
Nobuhiro Iwamatsu [Thu, 23 Dec 2021 07:36:33 +0000 (16:36 +0900)]
net: stmmac: dwmac-visconti: Fix value of ETHER_CLK_SEL_FREQ_SEL_2P5M

ETHER_CLK_SEL_FREQ_SEL_2P5M is not 0 bit of the register. This is a
value, which is 0. Fix from BIT(0) to 0.

Reported-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Link: https://lore.kernel.org/r/20211223073633.101306-1-nobuhiro1.iwamatsu@toshiba.co.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'r8152-fix-bugs'
Jakub Kicinski [Thu, 23 Dec 2021 17:56:08 +0000 (09:56 -0800)]
Merge branch 'r8152-fix-bugs'

Hayes Wang says:

====================
r8152: fix bugs

Patch #1 fix the issue of force speed mode for RTL8156.
Patch #2 fix the issue of unexpected ocp_base.
====================

Link: https://lore.kernel.org/r/20211223092702.23841-386-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agor8152: sync ocp base
Hayes Wang [Thu, 23 Dec 2021 09:27:02 +0000 (17:27 +0800)]
r8152: sync ocp base

There are some chances that the actual base of hardware is different
from the value recorded by driver, so we have to reset the variable
of ocp_base to sync it.

Set ocp_base to -1. Then, it would be updated and the new base would be
set to the hardware next time.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agor8152: fix the force speed doesn't work for RTL8156
Hayes Wang [Thu, 23 Dec 2021 09:27:01 +0000 (17:27 +0800)]
r8152: fix the force speed doesn't work for RTL8156

It needs to set mdio force mode. Otherwise, link off always occurs when
setting force speed.

Fixes: 195aae321c82 ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>