review.tizen.org Git - platform/kernel/linux-rpi.git/log

scripts/gdb: bail early if there are no clocks

Avoid generating an exception if there are no clocks registered:

(gdb) lx-clk-summary
enable prepare protect
clock count count count rate
------------------------------------------------------------------------
Python Exception <class 'gdb.error'>: No symbol "clk_root_list" in
current context.
Error occurred in Python: No symbol "clk_root_list" in current context.

Link: https://lkml.kernel.org/r/20230323225246.3302977-1-f.fainelli@gmail.com
Fixes: d1e9710b63d8 ("scripts/gdb: initial clk support: lx-clk-summary")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kexec: remove unnecessary arch_kexec_kernel_image_load()

arch_kexec_kernel_image_load() only calls kexec_image_load_default(), and
there are no arch-specific implementations.

Remove the unnecessary arch_kexec_kernel_image_load() and make
kexec_image_load_default() static.

No functional change intended.

Link: https://lkml.kernel.org/r/20230307224416.907040-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

x86/kexec: remove unnecessary arch_kexec_kernel_image_load()

Patch series "kexec: Remove unnecessary arch hook", v2.

There are no arch-specific things in arch_kexec_kernel_image_load(), so
remove it and just use the generic version.

This patch (of 2):

The x86 implementation of arch_kexec_kernel_image_load() is functionally
identical to the generic arch_kexec_kernel_image_load():

  arch_kexec_kernel_image_load                # x86
    if (!image->fops || !image->fops->load)
      return ERR_PTR(-ENOEXEC);
    return image->fops->load(image, image->kernel_buf, ...)

  arch_kexec_kernel_image_load                # generic
    kexec_image_load_default
      if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
      return image->fops->load(image, image->kernel_buf, ...)

Remove the x86-specific version and use the generic
arch_kexec_kernel_image_load().  No functional change intended.

Link: https://lkml.kernel.org/r/20230307224416.907040-1-helgaas@kernel.org
Link: https://lkml.kernel.org/r/20230307224416.907040-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

rapidio/tsi721: remove redundant pci_clear_master

Remove pci_clear_master to simplify the code, the bus-mastering is also
cleared in do_pci_disable_device, like this:

./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Link: https://lkml.kernel.org/r/20230323113711.10523-1-cai.huoqing@linux.dev
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel.h: split the hexadecimal related helpers to hex.h

For the sake of cleaning up the kernel.h split the hexadecimal related
helpers to own header called 'hex.h'.

Link: https://lkml.kernel.org/r/20230323155029.40000-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

epoll: use refcount to reduce ep_mutex contention

We are observing huge contention on the epmutex during an http
connection/rate test:

83.17% 0.25%  nginx            [kernel.kallsyms]         [k] entry_SYSCALL_64_after_hwframe
[...]
           |--66.96%--__fput
                      |--60.04%--eventpoll_release_file
                                 |--58.41%--__mutex_lock.isra.6
                                           |--56.56%--osq_lock

The application is multi-threaded, creates a new epoll entry for
each incoming connection, and does not delete it before the
connection shutdown - that is, before the connection's fd close().

Many different threads compete frequently for the epmutex lock,
affecting the overall performance.

To reduce the contention this patch introduces explicit reference counting
for the eventpoll struct. Each registered event acquires a reference,
and references are released at ep_remove() time.

The eventpoll struct is released by whoever - among EP file close() and
and the monitored file close() drops its last reference.

Additionally, this introduces a new 'dying' flag to prevent races between
the EP file close() and the monitored file close().
ep_eventpoll_release() marks, under f_lock spinlock, each epitem as dying
before removing it, while EP file close() does not touch dying epitems.

The above is needed as both close operations could run concurrently and
drop the EP reference acquired via the epitem entry. Without the above
flag, the monitored file close() could reach the EP struct via the epitem
list while the epitem is still listed and then try to put it after its
disposal.

An alternative could be avoiding touching the references acquired via
the epitems at EP file close() time, but that could leave the EP struct
alive for potentially unlimited time after EP file close(), with nasty
side effects.

With all the above in place, we can drop the epmutex usage at disposal time.

Overall this produces a significant performance improvement in the
mentioned connection/rate scenario: the mutex operations disappear from
the topmost offenders in the perf report, and the measured connections/rate
grows by ~60%.

To make the change more readable this additionally renames ep_free() to
ep_clear_and_put(), and moves the actual memory cleanup in a separate
ep_free() helper.

Link: https://lkml.kernel.org/r/4a57788dcaf28f5eb4f8dfddcc3a8b172a7357bb.1679504153.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Xiumei Mu <xmu@redhiat.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

notifiers: add tracepoints to the notifiers infrastructure

Currently there is no way to show the callback names for registered,
unregistered or executed notifiers. This is very useful for debug
purposes, hence add this functionality here in the form of notifiers'
tracepoints, one per operation.

[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20230314200058.1326909-1-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Guilherme G. Piccoli <kernel@gpiccoli.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static

smatch reports several warnings
kernel/hung_task.c:31:19: warning:
  symbol 'sysctl_hung_task_check_count' was not declared. Should it be static?
kernel/hung_task.c:50:29: warning:
  symbol 'sysctl_hung_task_check_interval_secs' was not declared. Should it be static?
kernel/hung_task.c:52:19: warning:
  symbol 'sysctl_hung_task_warnings' was not declared. Should it be static?
kernel/hung_task.c:75:28: warning:
  symbol 'sysctl_hung_task_panic' was not declared. Should it be static?

These variables are only used in hung_task.c, so they should be static

Link: https://lkml.kernel.org/r/20230312164645.471259-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Cc: Ben Dooks <ben.dooks@sifive.com>
Cc: fuyuanli <fuyuanli@didiglobal.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: remove the obsolete section EMBEDDED LINUX

By now, many developers are working on Linux for embedded systems. There
is no need to point out single developers. The linux-embedded mailing
list has only little traffic, and most of it is just spam.

Remove this obsolete section.

Link: https://lkml.kernel.org/r/20230308150625.28732-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Cc: Olivia Mackall <olivia@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: ignore ETHTOOL_LINK_MODE_ enum values

Since commit 4104a20646 ("checkpatch: ignore generated CamelCase defines
and enum values") enum values like ETHTOOL_LINK_MODE_Asym_Pause_BIT are
ignored. But there are other enums like
ETHTOOL_LINK_MODE_1000baseT_Full_BIT, which are not ignored because of the
not matching '1000baseT' substring.

Add regex to match all ETHTOOL_LINK_MODE enums.

Link: https://lkml.kernel.org/r/20230104201524.28078-1-gerhard@engleder-embedded.com
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Gerhard Engleder <gerhard@engleder-embedded.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/link-vmlinux.sh: fix error message presentation

This comes out as

Try make KALLSYMS_EXTRA_PASS=1 as a workaround

but we want quotes:

Try "make KALLSYMS_EXTRA_PASS=1" as a workaround

Link: https://lkml.kernel.org/r/202303042034.Cjc7JTd0-lkp@intel.com
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ELF: fix all "Elf" typos

ELF is acronym and therefore should be spelled in all caps.

I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like
being written in the first person.

Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: uninline kstrdup()

gcc inlines kstrdup into kstrdup_const() but it can very efficiently tail
call into it instead:

$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-84 (-84)
Function old new delta
kstrdup_const 119 35 -84

Link: https://lkml.kernel.org/r/Y/4fDlbIhTLNLFHz@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/gdb: support getting current task struct in UML

A running x86 UML kernel reports with architecture "i386:x86-64" as it is
a sub-architecture. However, a difference with bare-metal x86 kernels is
in how it manages tasks and the current task struct. To identify that the
inferior is a UML kernel and not bare-metal, check for the existence of
the UML specific symbol "cpu_tasks" which contains the current task
struct.

Link: https://lkml.kernel.org/r/b839d611e2906ccef2725c34d8e353fab35fe75e.1677469905.git.development@efficientek.com
Signed-off-by: Glenn Washburn <development@efficientek.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/gdb: correct indentation in get_current_task

Patch series "scripts/gdb: Support getting current task struct in UML",
v3.

A running x86 UML kernel reports with architecture "i386:x86-64" as it is
a sub-architecture.  However, a difference with bare-metal x86 kernels is
in how it manages tasks and the current task struct.  To identify that the
inferior is a UML kernel and not bare-metal, check for the existence of
the UML specific symbol "cpu_tasks" which contains the current task
struct.

This patch (of 3):

There is an extra space in a couple blocks in get_current_task.  Though
python does not care, let's make the spacing consistent.  Also, format
better an if expression, removing unneeded parenthesis.

Link: https://lkml.kernel.org/r/cover.1677469905.git.development@efficientek.com
Link: https://lkml.kernel.org/r/2e117b82240de6893f27cb6507242ce455ed7b5b.1677469905.git.development@efficientek.com
Signed-off-by: Glenn Washburn <development@efficientek.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

dca: delete unnecessary variable

It's more readable to just pass NULL directly instead of using a variable
for that.

Link: https://lkml.kernel.org/r/Y/yAlDytLH0ZNLNz@kili
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kcov: improve documentation

Improve KCOV documentation:

- Use KCOV instead of kcov, as the former is more widely-used.

- Mention Clang in compiler requirements.

- Use ``annotations`` for inline code.

- Rework remote coverage collection documentation for better clarity.

- Various smaller changes.

[andreyknvl@google.com: v2]
Link: https://lkml.kernel.org/r/583f41c49eef15210fa813e8229730d11427efa7.1677614637.git.andreyknvl@google.com
[andreyknvl@google.com: fix ``annotation`` for KCOV_REMOTE_ENABLE]
Link: https://lkml.kernel.org/r/72be5c215c275f35891229b90622ed859f196a46.1677684837.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/0b5efd70e31bba7912cf9a6c951f0e76a8df27df.1677517724.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nfs: remove empty if statement from nfs3_prepare_get_acl

Remove empty if statement from nfs3_prepare_get_acl and update comment to
follow the one from the referred fs/posix_acl.c:get_acl().

No functional change intended.

Link: https://lkml.kernel.org/r/20221130151231.3654-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

proc: remove mark_inode_dirty() in .setattr()

procfs' .setattr() has updated i_uid, i_gid and i_mode into proc dirent,
we don't need to call mark_inode_dirty() for delayed update, remove it.

Link: https://lkml.kernel.org/r/20230131150840.34726-1-chao@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ia64: salinfo: placate defined-but-not-used warning

When CONFIG_PROC_FS is not set, proc_salinfo_show() is not used.  Mark the
function as __maybe_unused to quieten the warning message.

../arch/ia64/kernel/salinfo.c:584:12: warning: 'proc_salinfo_show' defined but not used [-Wunused-function]
  584 | static int proc_salinfo_show(struct seq_file *m, void *v)
      |            ^~~~~~~~~~~~~~~~~

Link: https://lkml.kernel.org/r/20230223034309.13375-1-rdunlap@infradead.org
Fixes: 3f3942aca6da ("proc: introduce proc_create_single{,_data}")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ia64: mm/contig: fix section mismatch warning/error

alloc_per_cpu_data() is called by find_memory(), which is marked as
__init. Therefore alloc_per_cpu_data() can also be marked as __init to
remedy this modpost problem.

WARNING: modpost: vmlinux.o: section mismatch in reference: alloc_per_cpu_data (section: .text) -> memblock_alloc_try_nid (section: .init.text)

Link: https://lkml.kernel.org/r/20230223034258.12917-1-rdunlap@infradead.org
Fixes: 4b9ddc7cf272 ("[IA64] Fix section mismatch in contig.c version of per_cpu_init()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

delayacct: improve the average delay precision of getdelay tool to microsecond

Improve the average delay precision of getdelay tool to microsecond.  When
using the getdelay tool, it is sometimes found that the average delay
except CPU is not 0, but display is 0, because the precison is too low.
For example, see delay average of SWAP below when using ZRAM.

print delayacct stats ON
PID 32915
CPU             count     real total  virtual total    delay total  delay average
               339202     2793871936     9233585504        7951112          0.000ms
IO              count    delay total  delay average
                   41      419296904             10ms
SWAP            count    delay total  delay average
               242589     1045792384              0ms

This wrong display is misleading, so improve the millisecond precision of
the average delay to microsecond just like CPU.  Then user would get more
accurate information of delay time.

Link: https://lkml.kernel.org/r/202302131408087983857@zte.com.cn
Signed-off-by: Wang Yong <wang.yong12@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'gpio-fixes-for-v6.3-rc6' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix irq handling in gpio-davinci

- fix Kconfig dependencies for gpio-regmap

* tag 'gpio-fixes-for-v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: davinci: Add irq chip flag to skip set wake
  gpio: davinci: Do not clear the bank intr enable bit in save_context
  gpio: GPIO_REGMAP: select REGMAP instead of depending on it

Merge tag 'acpi-6.3-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"Fix the ACPI backlight override mechanism for the cases when
  acpi_backlight=video is set through the kernel command line or a DMI
  quirk and add backlight quirks for Apple iMac14,1 and iMac14,2 and
  Lenovo ThinkPad W530 (Hans de Goede)"

* tag 'acpi-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530
  ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2
  ACPI: video: Make acpi_backlight=video work independent from GPU driver
  ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
"Fix uninitialised variable warning (from smatch) in the arm64 compat
alignment fixup code"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: compat: Work around uninitialized variable warning

Merge tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:
"Four fixes, three for stable:

   - slab out of bounds fix

   - lock cancellation fix

   - minor cleanup to address clang warning

   - fix for xfstest 551 (wrong parms passed to kvmalloc)"

* tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr
  ksmbd: delete asynchronous work from list
  ksmbd: remove unused is_char_allowed function
  ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN

Merge tag 'net-6.3-rc6-2' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and can.

  Current release - regressions:

   - wifi: mac80211:
      - fix potential null pointer dereference
      - fix receiving mesh packets in forwarding=0 networks
      - fix mesh forwarding

  Current release - new code bugs:

   - virtio/vsock: fix leaks due to missing skb owner

  Previous releases - regressions:

   - raw: fix NULL deref in raw_get_next().

   - sctp: check send stream number after wait_for_sndbuf

   - qrtr:
      - fix a refcount bug in qrtr_recvmsg()
      - do not do DEL_SERVER broadcast after DEL_CLIENT

   - wifi: brcmfmac: fix SDIO suspend/resume regression

   - wifi: mt76: fix use-after-free in fw features query.

   - can: fix race between isotp_sendsmg() and isotp_release()

   - eth: mtk_eth_soc: fix remaining throughput regression

   - eth: ice: reset FDIR counter in FDIR init stage

  Previous releases - always broken:

   - core: don't let netpoll invoke NAPI if in xmit context

   - icmp: guard against too small mtu

   - ipv6: fix an uninit variable access bug in __ip6_make_skb()

   - wifi: mac80211: fix the size calculation of
     ieee80211_ie_len_eht_cap()

   - can: fix poll() to not report false EPOLLOUT events

   - eth: gve: secure enough bytes in the first TX desc for all TCP
     pkts"

* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net: stmmac: check fwnode for phy device before scanning for phy
  net: stmmac: Add queue reset into stmmac_xdp_open() function
  selftests: net: rps_default_mask.sh: delete veth link specifically
  net: fec: make use of MDIO C45 quirk
  can: isotp: fix race between isotp_sendsmg() and isotp_release()
  can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
  can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
  can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
  gve: Secure enough bytes in the first TX desc for all TCP pkts
  netlink: annotate lockless accesses to nlk->max_recvmsg_len
  ethtool: reset #lanes when lanes is omitted
  ping: Fix potentail NULL deref for /proc/net/icmp.
  raw: Fix NULL deref in raw_get_next().
  ice: Reset FDIR counter in FDIR init stage
  ice: fix wrong fallback logic for FDIR
  net: stmmac: fix up RX flow hash indirection table when setting channels
  net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
  wifi: mt76: ignore key disable commands
  wifi: ath11k: reduce the MHI timeout to 20s
  ipv6: Fix an uninit variable access bug in __ip6_make_skb()
  ...

Merge tag 'linux-kselftest-fixes-6.3-rc6' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:
"One single fix to mount_setattr_test build failure"

* tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests mount: Fix mount_setattr_test builds failed

Merge tag 'for-linus-iommufd' of git://git./linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:

- An invalid VA range can be be put in a pages and eventually trigger
   WARN_ON, reject it early

- Use of the wrong start index value when doing the complex batch carry
   scheme

- Wrong store ordering resulting in corrupting data used in a later
   calculation that corrupted the batch structure during carry

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Do not corrupt the pfn list when doing batch carry
  iommufd: Fix unpinning of pages when an access is present
  iommufd: Check for uptr overflow

Merge tag 'pwm/for-6.3-rc6' of git://git./linux/kernel/git/thierry.reding/linux-pwm

Pull pwm fixes from Thierry Reding:
"These are some fixes to make sure the PWM state structure is always
  initialized to a known state.

  Prior to this it could happen in some situations that random data from
  the stack would leak into the data structure and cause subtle bugs"

* tag 'pwm/for-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: Zero-initialize the pwm_state passed to driver's .get_state()
  pwm: meson: Explicitly set .polarity in .get_state()
  pwm: sprd: Explicitly set .polarity in .get_state()
  pwm: iqs620a: Explicitly set .polarity in .get_state()
  pwm: cros-ec: Explicitly set .polarity in .get_state()
  pwm: hibvt: Explicitly set .polarity in .get_state()

Merge tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
"Mostly i915 fixes: dp mst for compression/dsc, perf ioctl uaf, ctx rpm
  accounting, gt reset vs huc loading.

  And a few individual driver fixes: ivpu dma fence&suspend, panfrost
  mmap, nouveau color depth"

* tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm:
  accel/ivpu: Fix S3 system suspend when not idle
  accel/ivpu: Add dma fence to command buffers only
  drm/i915: Fix context runtime accounting
  drm/i915: fix race condition UAF in i915_perf_add_config_ioctl
  drm/i915: Use compressed bpp when calculating m/n value for DP MST DSC
  drm/i915/huc: Cancel HuC delayed load timer on reset.
  drm/i915/ttm: fix sparse warning
  drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
  drm/nouveau/disp: Support more modes by checking with lower bpc

Merge tag 'sound-6.3-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"The majority of changes here are various fixes for Intel drivers,
  and there is a change in ASoC PCM core for the format constraints.

  In addition, a workaround for HD-audio HDMI regressions and usual
  HD-audio quirks are found"

* tag 'sound-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/hdmi: Preserve the previous PCM device upon re-enablement
  ALSA: hda/realtek: Add quirk for Clevo X370SNW
  ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
  ASoC: SOF: avoid a NULL dereference with unsupported widgets
  ASoC: da7213.c: add missing pm_runtime_disable()
  ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
  ASoC: codecs: lpass: fix the order or clks turn off during suspend
  ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750
  ASoC: SOF: ipc4: Ensure DSP is in D0I0 during sof_ipc4_set_get_data()
  ASoC: amd: yc: Add DMI entries to support Victus by HP Laptop 16-e1xxx (8A22)
  ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
  ASoC: Intel: soc-acpi: add table for Intel 'Rooks County' NUC M15
  ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15

Merge tag 'platform-drivers-x86-v6.3-5' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

-  more think-lmi fixes

-  one DMI quirk addition

* tag 'platform-drivers-x86-v6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: thinkpad_acpi: Add missing T14s Gen1 type to s2idle quirk list
  platform/x86: think-lmi: Clean up display of current_value on Thinkstation
  platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings
  platform/x86: think-lmi: Fix memory leak when showing current settings

Merge tag 'asm-generic-fixes-6.3' of git://git./linux/kernel/git/arnd/asm-generic

Pull asm-generic fixes from Arnd Bergmann:
"These are minor fixes to address false-positive build warnings:

  Some of the less common I/O accessors are missing __force casts and
  cause sparse warnings for their implied byteswap, and a recent change
  to __generic_cmpxchg_local() causes a warning about constant integer
  truncation"

* tag 'asm-generic-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: avoid __generic_cmpxchg_local warnings
  asm-generic/io.h: suppress endianness warnings for relaxed accessors
  asm-generic/io.h: suppress endianness warnings for readq() and writeq()

net: stmmac: check fwnode for phy device before scanning for phy

Some DT devices already have phy device configured in the DT/ACPI.
Current implementation scans for a phy unconditionally even though
there is a phy listed in the DT/ACPI and already attached.

We should check the fwnode if there is any phy device listed in
fwnode and decide whether to scan for a phy to attach to.

Fixes: fe2cfbc96803 ("net: stmmac: check if MAC needs to attach to a PHY")
Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/lkml/20230403212434.296975-1-martin.blumenstingl@googlemail.com/
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Shahab Vahedi <shahab@synopsys.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Link: https://lore.kernel.org/r/20230406024541.3556305-1-michael.wei.hong.sit@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: Add queue reset into stmmac_xdp_open() function

Queue reset was moved out from __init_dma_rx_desc_rings() and
__init_dma_tx_desc_rings() functions. Thus, the driver fails to transmit
and receive packet after XDP prog setup.

This commit adds the missing queue reset into stmmac_xdp_open() function.

Fixes: f9ec5723c3db ("net: ethernet: stmicro: stmmac: move queue reset to dedicated functions")
Cc: <stable@vger.kernel.org> # 6.0+
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230404044823.3226144-1-yoong.siang.song@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: rps_default_mask.sh: delete veth link specifically

When deleting the netns and recreating a new one while re-adding the
veth interface, there is a small window of time during which the old
veth interface has not yet been removed. This can cause the new addition
to fail. To resolve this issue, we can either wait for a short while to
ensure that the old veth interface is deleted, or we can specifically
remove the veth interface.

Before this patch:
  # ./rps_default_mask.sh
  empty rps_default_mask                                      [ ok ]
  changing rps_default_mask dont affect existing devices      [ ok ]
  changing rps_default_mask dont affect existing netns        [ ok ]
  changing rps_default_mask affect newly created devices      [ ok ]
  changing rps_default_mask don't affect newly child netns[II][ ok ]
  rps_default_mask is 0 by default in child netns             [ ok ]
  RTNETLINK answers: File exists
  changing rps_default_mask in child ns don't affect the main one[ ok ]
  cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory
  changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected
  [fail] expected 1 found
  changing rps_default_mask in child ns don't affect existing devices[ ok ]

After this patch:
  # ./rps_default_mask.sh
  empty rps_default_mask                                      [ ok ]
  changing rps_default_mask dont affect existing devices      [ ok ]
  changing rps_default_mask dont affect existing netns        [ ok ]
  changing rps_default_mask affect newly created devices      [ ok ]
  changing rps_default_mask don't affect newly child netns[II][ ok ]
  rps_default_mask is 0 by default in child netns             [ ok ]
  changing rps_default_mask in child ns don't affect the main one[ ok ]
  changing rps_default_mask in child ns affects new childns devices[ ok ]
  changing rps_default_mask in child ns don't affect existing devices[ ok ]

Fixes: 3a7d84eae03b ("self-tests: more rps self tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: make use of MDIO C45 quirk

Not all fec MDIO bus drivers support C45 mode transactions. The older fec
hardware block in many ColdFire SoCs does not appear to support them, at
least according to most of the different ColdFire SoC reference manuals.
The bits used to generate C45 access on the iMX parts, in the OP field
of the MMFR register, are documented as generating non-compliant MII
frames (it is not documented as to exactly how they are non-compliant).

Commit 8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions")
means the fec driver will always register c45 MDIO read and write
methods. During probe these will always be accessed now generating
non-compliant MII accesses on ColdFire based devices.

Add a quirk define, FEC_QUIRK_HAS_MDIO_C45, that can be used to
distinguish silicon that supports MDIO C45 framing or not. Add this to
all the existing iMX quirks, so they will be behave as they do now (*).

(*) it seems that some iMX parts may not support C45 transactions either.
The iMX25 and iMX50 Reference Manuals contain similar wording to
the ColdFire Reference Manuals on this.

Fixes: 8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230404052207.3064861-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-2023-04-05' of git://git./linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.3

mt76 has a fix for leaking cleartext frames on a certain scenario and
two firmware file handling related fixes. For brcmfmac we have a fix
for an older SDIO suspend regression and for ath11k avoiding a kernel
crash during hibernation with SUSE kernels.

* tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mt76: ignore key disable commands
  wifi: ath11k: reduce the MHI timeout to 20s
  wifi: mt76: mt7921: fix fw used for offload check for mt7922
  wifi: mt76: mt7921: Fix use-after-free in fw features query.
  wifi: brcmfmac: Fix SDIO suspend/resume regression
====================

Link: https://lore.kernel.org/r/20230405105536.4E946C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'linux-can-fixes-for-6.3-20230405' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2023-04-05

The first patch is by Oleksij Rempel and fixes a out-of-bounds memory
access in the j1939 protocol.

The remaining 3 patches target the ISOTP protocol. Oliver Hartkopp
fixes the ISOTP protocol to pass information about dropped PDUs to the
user space via control messages. Michal Sojka's patch fixes poll() to
not forward false EPOLLOUT events. And Oliver Hartkopp fixes a race
condition between isotp_sendsmg() and isotp_release().

* tag 'linux-can-fixes-for-6.3-20230405' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: isotp: fix race between isotp_sendsmg() and isotp_release()
  can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
  can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
  can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
====================

Link: https://lore.kernel.org/r/20230405092444.1802340-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-04-04 (ice)

This series contains updates to ice driver only.

Simei adjusts error path on adding VF Flow Director filters that were
not releasing all resources.

Lingyu adds setting/resetting of VF Flow Director filters counters
during initialization.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
====================

Link: https://lore.kernel.org/r/20230404172306.450880-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'drm-misc-fixes-2023-04-05' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

* ivpu: DMA fence and suspend fixes
* nouveau: Color-depth fixes
* panfrost: Fix mmap error handling

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230405182855.GA1551@linux-uq9g

ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530

The Lenovo ThinkPad W530 uses a nvidia k1000m GPU. When this gets used
together with one of the older nvidia binary driver series (the latest
series does not support it), then backlight control does not work.

This is caused by commit 3dbc80a3e4c5 ("ACPI: video: Make backlight
class device registration a separate step (v2)") combined with
commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default").

After these changes the acpi_video# backlight device is only registered
when requested by a GPU driver calling acpi_video_register_backlight()
which the nvidia binary driver does not do.

I realize that using the nvidia binary driver is not a supported use-case
and users can workaround this by adding acpi_backlight=video on the kernel
commandline, but the ThinkPad W530 is a popular model under Linux users,
so it seems worthwhile to add a quirk for this.

I will also email Nvidia asking them to make the driver call
acpi_video_register_backlight() when an internal LCD panel is detected.
So maybe the next maintenance release of the drivers will fix this...

Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2

On the Apple iMac14,1 and iMac14,2 all-in-ones (monitors with builtin "PC")
the connection between the GPU and the panel is seen by the GPU driver as
regular DP instead of eDP, causing the GPU driver to never call
acpi_video_register_backlight().

(GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.)

Fix the missing acpi_video# backlight device on these all-in-ones by
adding a acpi_backlight=video DMI quirk, so that video.ko will
immediately register the backlight device instead of waiting for
an acpi_video_register_backlight() call.

Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: video: Make acpi_backlight=video work independent from GPU driver

Commit 3dbc80a3e4c5 ("ACPI: video: Make backlight class device
registration a separate step (v2)") combined with
commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default")

Means that the video.ko code now fully depends on the GPU driver calling
acpi_video_register_backlight() for the acpi_video# backlight class
devices to get registered.

This means that if the GPU driver does not do this, acpi_backlight=video
on the cmdline, or DMI quirks for selecting acpi_video# will not work.

This is a problem on for example Apple iMac14,1 all-in-ones where
the monitor's LCD panel shows up as a regular DP connection instead of
eDP so the GPU driver will not call acpi_video_register_backlight() [1].

Fix this by making video.ko directly register the acpi_video# devices
when these have been explicitly requested either on the cmdline or
through DMI quirks (rather then auto-detection being used).

[1] GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.

Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()

Allow callers of __acpi_video_get_backlight_type() to pass a pointer
to a bool which will get set to false if the backlight-type comes from
the cmdline or a DMI quirk and set to true if auto-detection was used.

And make __acpi_video_get_backlight_type() non static so that it can
be called directly outside of video_detect.c .

While at it turn the acpi_video_get_backlight_type() and
acpi_video_backlight_use_native() wrappers into static inline functions
in include/acpi/video.h, so that we need to export one less symbol.

Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

arm64: compat: Work around uninitialized variable warning

Dan reports that smatch complains about a potential uninitialized
variable being used in the compat alignment fixup code.

The logic is not wrong per se, but we do end up using an uninitialized
variable if reading the instruction that triggered the alignment fault
from user space faults, even if the fault ensures that the uninitialized
value doesn't propagate any further.

Given that we just give up and return 1 if any fault occurs when reading
the instruction, let's get rid of the 'success handling' pattern that
captures the fault in a variable and aborts later, and instead, just
return 1 immediately if any of the get_user() calls result in an
exception.

Fixes: 3fc24ef32d3b ("arm64: compat: Implement misalignment fixups for multiword loads")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202304021214.gekJ8yRc-lkp@intel.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230404103625.2386382-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Merge tag 'trace-v6.3-rc5' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix timerlat notification, as it was not triggering the notify to
   users when a new max latency was hit.

- Do not trigger max latency if the tracing is off.

   When tracing is off, the ring buffer is not updated, it does not make
   sense to notify when there's a new max latency detected by the
   tracer, as why that latency happened is not available. The tracing
   logic still runs when the ring buffer is disabled, but it should not
   be triggering notifications.

- Fix race on freeing the synthetic event "last_cmd" variable by adding
   a mutex around it.

- Fix race between reader and writer of the ring buffer by adding
   memory barriers. When the writer is still on the reader page it must
   have its content visible on the buffer before it moves the commit
   index that the reader uses to know how much content is on the page.

- Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and
   _RET_IP_, which gets broken if it is not inlined.

- Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build.

   The field formats of trace events are calculated by using
   sizeof(type) and other means by what is passed into the structure
   macros like __field(). The __field() macro is only meant for atom
   types like int, long, short, pointer, etc. It is not meant for
   arrays.

   The code will currently compile with arrays, but then the format
   produced will be inaccurate, and user space parsing tools will break.

   Two bugs have already been fixed, now add code that will make the
   kernel fail to build if another trace event includes this buggy field
   format.

- Fix boot up snapshot code:

   Boot snapshots were triggering when not even asked for on the kernel
   command line. This was caused by two bugs:

    1) It would trigger a snapshot on any instance if one was created
       from the kernel command line.

    2) The error handling would only affect the top level instance.
       So the fact that a snapshot was done on a instance that didn't
       allocate a buffer triggered a warning written into the top level
       buffer, and worse yet, disabled the top level buffer.

- Fix memory leak that was caused when an error was logged in a trace
   buffer instance, and then the buffer instance was removed.

   The allocated error log messages still needed to be freed.

* tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Free error logs of tracing instances
  tracing: Fix ftrace_boot_snapshot command line logic
  tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
  tracing: Error if a trace event has an array for a __field()
  tracing/osnoise: Fix notify new tracing_max_latency
  tracing/timerlat: Notify new max thread latency
  ftrace: Mark get_lock_parent_ip() __always_inline
  ring-buffer: Fix race while reader and writer are on the same page
  tracing/synthetic: Fix races on freeing last_cmd

tracing: Free error logs of tracing instances

When a tracing instance is removed, the error messages that hold errors
that occurred in the instance needs to be freed. The following reports a
memory leak:

# cd /sys/kernel/tracing
# mkdir instances/foo
# echo 'hist:keys=x' > instances/foo/events/sched/sched_switch/trigger
# cat instances/foo/error_log
[  117.404795] hist:sched:sched_switch: error: Couldn't find field
   Command: hist:keys=x
                      ^
# rmdir instances/foo

Then check for memory leaks:

# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88810d8ec700 (size 192):
  comm "bash", pid 869, jiffies 4294950577 (age 215.752s)
  hex dump (first 32 bytes):
    60 dd 68 61 81 88 ff ff 60 dd 68 61 81 88 ff ff  `.ha....`.ha....
    a0 30 8c 83 ff ff ff ff 26 00 0a 00 00 00 00 00  .0......&.......
  backtrace:
    [<00000000dae26536>] kmalloc_trace+0x2a/0xa0
    [<00000000b2938940>] tracing_log_err+0x277/0x2e0
    [<000000004a0e1b07>] parse_atom+0x966/0xb40
    [<0000000023b24337>] parse_expr+0x5f3/0xdb0
    [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
    [<00000000293a9645>] trigger_process_regex+0x135/0x1a0
    [<000000005c22b4f2>] event_trigger_write+0x87/0xf0
    [<000000002cadc509>] vfs_write+0x162/0x670
    [<0000000059c3b9be>] ksys_write+0xca/0x170
    [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0
    [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff888170c35a00 (size 32):
  comm "bash", pid 869, jiffies 4294950577 (age 215.752s)
  hex dump (first 32 bytes):
    0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 68 69 73 74  .  Command: hist
    3a 6b 65 79 73 3d 78 0a 00 00 00 00 00 00 00 00  :keys=x.........
  backtrace:
    [<000000006a747de5>] __kmalloc+0x4d/0x160
    [<000000000039df5f>] tracing_log_err+0x29b/0x2e0
    [<000000004a0e1b07>] parse_atom+0x966/0xb40
    [<0000000023b24337>] parse_expr+0x5f3/0xdb0
    [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
    [<00000000293a9645>] trigger_process_regex+0x135/0x1a0
    [<000000005c22b4f2>] event_trigger_write+0x87/0xf0
    [<000000002cadc509>] vfs_write+0x162/0x670
    [<0000000059c3b9be>] ksys_write+0xca/0x170
    [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0
    [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

The problem is that the error log needs to be freed when the instance is
removed.

Link: https://lore.kernel.org/lkml/76134d9f-a5ba-6a0d-37b3-28310b4a1e91@alu.unizg.hr/
Link: https://lore.kernel.org/linux-trace-kernel/20230404194504.5790b95f@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Fixes: 2f754e771b1a6 ("tracing: Have the error logs show up in the proper instances")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

can: isotp: fix race between isotp_sendsmg() and isotp_release()

As discussed with Dae R. Jeong and Hillf Danton here [1] the sendmsg()
function in isotp.c might get into a race condition when restoring the
former tx.state from the old_state.

Remove the old_state concept and implement proper locking for the
ISOTP_IDLE transitions in isotp_sendmsg(), inspired by a
simplification idea from Hillf Danton.

Introduce a new tx.state ISOTP_SHUTDOWN and use the same locking
mechanism from isotp_release() which resolves a potential race between
isotp_sendsmg() and isotp_release().

[1] https://lore.kernel.org/linux-can/ZB%2F93xJxq%2FBUqAgG@dragonet

v1: https://lore.kernel.org/all/20230331102114.15164-1-socketcan@hartkopp.net
v2: https://lore.kernel.org/all/20230331123600.3550-1-socketcan@hartkopp.net
    take care of signal interrupts for wait_event_interruptible() in
    isotp_release()
v3: https://lore.kernel.org/all/20230331130654.9886-1-socketcan@hartkopp.net
    take care of signal interrupts for wait_event_interruptible() in
    isotp_sendmsg() in the wait_tx_done case
v4: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
    take care of signal interrupts for wait_event_interruptible() in
    isotp_sendmsg() in ALL cases

Cc: Dae R. Jeong <threeearcat@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes: 4f027cba8216 ("can: isotp: split tx timer into transmission and timeout")
Link: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'drm-intel-fixes-2023-04-05' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v6.3-rc6:
- Fix DP MST DSC M/N calculation to use compressed bpp
- Fix racy use-after-free in perf ioctl
- Fix context runtime accounting
- Fix handling of GT reset during HuC loading
- Fix use of unsigned vm_fault_t for error values

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87zg7mzomz.fsf@intel.com

can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events

When using select()/poll()/epoll() with a non-blocking ISOTP socket to
wait for when non-blocking write is possible, a false EPOLLOUT event
is sometimes returned. This can happen at least after sending a
message which must be split to multiple CAN frames.

The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
not equal to ISOTP_IDLE and this behavior is not reflected in
datagram_poll(), which is used in isotp_ops.

This is fixed by introducing ISOTP-specific poll function, which
suppresses the EPOLLOUT events in that case.

v2: https://lore.kernel.org/all/20230302092812.320643-1-michal.sojka@cvut.cz
v1: https://lore.kernel.org/all/20230224010659.48420-1-michal.sojka@cvut.cz
https://lore.kernel.org/all/b53a04a2-ba1f-3858-84c1-d3eb3301ae15@hartkopp.net

Signed-off-by: Michal Sojka <michal.sojka@cvut.cz>
Reported-by: Jakub Jira <jirajak2@fel.cvut.cz>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20230331125511.372783-1-michal.sojka@cvut.cz
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos

isotp.c was still using sock_recv_timestamp() which does not provide
control messages to detect dropped PDUs in the receive path.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access

In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
could occur during the memcpy() operation if the size of skb->cb is
larger than the size of struct j1939_sk_buff_cb. This is because the
memcpy() operation uses the size of skb->cb, leading to a read beyond
the struct j1939_sk_buff_cb.

Updated the memcpy() operation to use the size of struct
j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
memcpy() operation only reads the memory within the bounds of struct
j1939_sk_buff_cb, preventing out-of-bounds memory access.

Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
is greater than or equal to the size of struct j1939_sk_buff_cb. This
ensures that the skb->cb buffer is large enough to hold the
j1939_sk_buff_cb structure.

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Reported-by: Shuangpeng Bai <sjb7183@psu.edu>
Tested-by: Shuangpeng Bai <sjb7183@psu.edu>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
Cc: stable@vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

accel/ivpu: Fix S3 system suspend when not idle

Wait for VPU to be idle in ivpu_pm_suspend_cb() before powering off
the device, so jobs are not lost and TDRs are not triggered after
resume.

Fixes: 852be13f3bd3 ("accel/ivpu: Add PM support")
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-3-stanislaw.gruszka@linux.intel.com

accel/ivpu: Add dma fence to command buffers only

Currently job->done_fence is added to every BO handle within a job. If job
handle (command buffer) is shared between multiple submits, KMD will add
the fence in each of them. Then bo_wait_ioctl() executed on command buffer
will exit only when all jobs containing that handle are done.

This creates deadlock scenario for user mode driver in case when job handle
is added as dependency of another job, because bo_wait_ioctl() of first job
will wait until second job finishes, and second job can not finish before
first one.

Having fences added only to job buffer handle allows user space to execute
bo_wait_ioctl() on the job even if it's handle is submitted with other job.

Fixes: cd7272215c44 ("accel/ivpu: Add command buffer submission logic")
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-2-stanislaw.gruszka@linux.intel.com

tracing: Fix ftrace_boot_snapshot command line logic

The kernel command line ftrace_boot_snapshot by itself is supposed to
trigger a snapshot at the end of boot up of the main top level trace
buffer. A ftrace_boot_snapshot=foo will do the same for an instance called
foo that was created by trace_instance=foo,...

The logic was broken where if ftrace_boot_snapshot was by itself, it would
trigger a snapshot for all instances that had tracing enabled, regardless
if it asked for a snapshot or not.

When a snapshot is requested for a buffer, the buffer's
tr->allocated_snapshot is set to true. Use that to know if a trace buffer
wants a snapshot at boot up or not.

Since the top level buffer is part of the ftrace_trace_arrays list,
there's no reason to treat it differently than the other buffers. Just
iterate the list if ftrace_boot_snapshot was specified.

Link: https://lkml.kernel.org/r/20230405022341.895334039@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Fixes: 9c1c251d670bc ("tracing: Allow boot instances to have snapshot buffers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance

If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.

Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

gve: Secure enough bytes in the first TX desc for all TCP pkts

Non-GSO TCP packets whose SKBs' linear portion did not include the
entire TCP header were not populating the first Tx descriptor with
as many bytes as the vNIC expected. This change ensures that all
TCP packets populate the first descriptor with the correct number of
bytes.

Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: annotate lockless accesses to nlk->max_recvmsg_len

syzbot reported a data-race in data-race in netlink_recvmsg() [1]

Indeed, netlink_recvmsg() can be run concurrently,
and netlink_dump() also needs protection.

[1]
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg

read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
__sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
__do_sys_recvfrom net/socket.c:2212 [inline]
__se_sys_recvfrom net/socket.c:2208 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
____sys_recvmsg+0x156/0x310 net/socket.c:2720
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x0000000000000000 -> 0x0000000000001000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023

Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: reset #lanes when lanes is omitted

If the number of lanes was forced and then subsequently the user
omits this parameter, the ksettings->lanes is reset. The driver
should then reset the number of lanes to the device's default
for the specified speed.

However, although the ksettings->lanes is set to 0, the mod variable
is not set to true to indicate the driver and userspace should be
notified of the changes.

The consequence is that the same ethtool operation will produce
different results based on the initial state.

If the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
        Speed: 500000Mb/s
        Lanes: 2
        Duplex: Full
        Auto-negotiation: on

then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield:
$ ethtool swp1 | grep -A 3 'Speed: '
        Speed: 500000Mb/s
        Lanes: 2
        Duplex: Full
        Auto-negotiation: off

While if the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
        Speed: 500000Mb/s
        Lanes: 1
        Duplex: Full
        Auto-negotiation: off

executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in:
$ ethtool swp1 | grep -A 3 'Speed: '
        Speed: 500000Mb/s
        Lanes: 1
        Duplex: Full
        Auto-negotiation: off

This patch fixes this behavior. Omitting lanes will always results in
the driver choosing the default lane width for the chosen speed. In this
scenario, regardless of the initial state, the end state will be, e.g.,

$ ethtool swp1 | grep -A 3 'Speed: '
        Speed: 500000Mb/s
        Lanes: 2
        Duplex: Full
        Auto-negotiation: off

Fixes: 012ce4dd3102 ("ethtool: Extend link modes settings uAPI with lanes")
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/ac238d6b-8726-8156-3810-6471291dbc7f@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'raw-ping-fix-locking-in-proc-net-raw-icmp'

Kuniyuki Iwashima says:

====================
raw/ping: Fix locking in /proc/net/{raw,icmp}.

The first patch fixes a NULL deref for /proc/net/raw and second one fixes
the same issue for ping sockets.

The first patch also converts hlist_nulls to hlist, but this is because
the current code uses sk_nulls_for_each() for lockless readers, instead
of sk_nulls_for_each_rcu() which adds memory barrier, but raw sockets
does not use the nulls marker nor SLAB_TYPESAFE_BY_RCU in the first place.

OTOH, the ping sockets already uses sk_nulls_for_each_rcu(), and such
conversion can be posted later for net-next.
====================

Link: https://lore.kernel.org/r/20230403194959.48928-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ping: Fix potentail NULL deref for /proc/net/icmp.

After commit dbca1596bbb0 ("ping: convert to RCU lookups, get rid
of rwlock"), we use RCU for ping sockets, but we should use spinlock
for /proc/net/icmp to avoid a potential NULL deref mentioned in
the previous patch.

Let's go back to using spinlock there.

Note we can convert ping sockets to use hlist instead of hlist_nulls
because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.

Fixes: dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

raw: Fix NULL deref in raw_get_next().

Dae R. Jeong reported a NULL deref in raw_get_next() [0].

It seems that the repro was running these sequences in parallel so
that one thread was iterating on a socket that was being freed in
another netns.

  unshare(0x40060200)
  r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00')
  socket$inet_icmp_raw(0x2, 0x3, 0x1)
  pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f)

After commit 0daf07e52709 ("raw: convert raw sockets to RCU"), we
use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
sockets.  However, we should use spinlock for slow paths to avoid
the NULL deref.

Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
is not reused during iteration in the grace period.  In fact, the
lockless readers do not check the nulls marker with get_nulls_value().
So, SOCK_RAW should use hlist instead of hlist_nulls.

Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
fast paths and sk_for_each() and spinlock for /proc/net/raw.

[0]:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
RIP: 0010:sock_net include/net/sock.h:649 [inline]
RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225
seq_read+0x224/0x320 fs/seq_file.c:162
pde_read fs/proc/inode.c:316 [inline]
proc_reg_read+0x23f/0x330 fs/proc/inode.c:328
vfs_read+0x31e/0xd30 fs/read_write.c:468
ksys_pread64 fs/read_write.c:665 [inline]
__do_sys_pread64 fs/read_write.c:675 [inline]
__se_sys_pread64 fs/read_write.c:672 [inline]
__x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x478d29
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29
RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003
RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000
R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740
R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
RIP: 0010:sock_net include/net/sock.h:649 [inline]
RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 0daf07e52709 ("raw: convert raw sockets to RCU")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Dae R. Jeong <threeearcat@gmail.com>
Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"PPC:
   - Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled

  s390:
   - Fix handling of external interrupts in protected guests

  x86:
   - Resample the pending state of IOAPIC interrupts when unmasking them

   - Fix usage of Hyper-V "enlightened TLB" on AMD

   - Small fixes to real mode exceptions

   - Suppress pending MMIO write exits if emulator detects exception

  Documentation:
   - Fix rST syntax"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  docs: kvm: x86: Fix broken field list
  KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent
  KVM: s390: pv: fix external interruption loop not always detected
  KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
  KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
  KVM: x86: Suppress pending MMIO write exits if emulator detects exception
  KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
  KVM: irqfd: Make resampler_list an RCU list
  KVM: SVM: Flush Hyper-V TLB when required

Merge tag 'nfsd-6.3-5' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix a crash and a resource leak in NFSv4 COMPOUND processing

- Fix issues with AUTH_SYS credential handling

- Try again to address an NFS/NFSD/SUNRPC build dependency regression

* tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: callback request does not use correct credential for AUTH_SYS
  NFS: Remove "select RPCSEC_GSS_KRB5
  sunrpc: only free unix grouplist after RCU settles
  nfsd: call op_release, even when op_func returns an error
  NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL

docs: kvm: x86: Fix broken field list

Add a missing ":" to fix a broken field list.

Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Fixes: ba7bb663f554 ("KVM: x86: Provide per VM capability for disabling PMU virtualization")
Message-Id: <20230331093116.99820-1-itazur@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

asm-generic: avoid __generic_cmpxchg_local warnings

Code that passes a 32-bit constant into cmpxchg() produces a harmless
sparse warning because of the truncation in the branch that is not taken:

fs/erofs/zdata.c: note: in included file (through /home/arnd/arm-soc/arch/arm/include/asm/cmpxchg.h, /home/arnd/arm-soc/arch/arm/include/asm/atomic.h, /home/arnd/arm-soc/include/linux/atomic.h, ...):
include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (5f0ecafe becomes fe)
include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (5f0ecafe becomes cafe)
include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (5f0ecafe becomes fe)
include/asm-generic/cmpxchg-local.h:30:42: warning: cast truncates bits from constant value (5f0edead becomes ad)
include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (5f0ecafe becomes cafe)
include/asm-generic/cmpxchg-local.h:34:44: warning: cast truncates bits from constant value (5f0edead becomes dead)

This was reported as a regression to Matt's recent __generic_cmpxchg_local
patch, though this patch only added more warnings on top of the ones
that were already there.

Rewording the truncation to use an explicit bitmask instead of a cast
to a smaller type avoids the warning but otherwise leaves the code
unchanged.

I had another look at why the cast is even needed for atomic_cmpxchg(),
and as Matt describes the problem here is that atomic_t contains a
signed 'int', but cmpxchg() takes an 'unsigned long' argument, and
converting between the two leads to a 64-bit sign-extension of
negative 32-bit atomics.

I checked the other implementations of arch_cmpxchg() and did not find
any others that run into the same problem as __generic_cmpxchg_local(),
but it's easy to be on the safe side here and always convert the
signed int into an unsigned int when calling arch_cmpxchg(), as this
will work even when any of the arch_cmpxchg() implementations run
into the same problem.

Fixes: 624654152284 ("locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value")
Reviewed-by: Matt Evans <mev@rivosinc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

asm-generic/io.h: suppress endianness warnings for relaxed accessors

Copy the forced type casts from the normal MMIO accessors to suppress
the sparse warnings that point out __raw_readl() returns a native endian
word (just like readl()).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

asm-generic/io.h: suppress endianness warnings for readq() and writeq()

Commit c1d55d50139b ("asm-generic/io.h: Fix sparse warnings on
big-endian architectures") missed fixing the 64-bit accessors.

Arnd explains in the attached link why the casts are necessary, even if
__raw_readq() and __raw_writeq() do not take endian-specific types.

Link: https://lore.kernel.org/lkml/9105d6fc-880b-4734-857d-e3d30b87ccf6@app.fastmail.com/
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ice: Reset FDIR counter in FDIR init stage

Reset the FDIR counters when FDIR inits. Without this patch,
when VF initializes or resets, all the FDIR counters are not
cleaned, which may cause unexpected behaviors for future FDIR
rule create (e.g., rule conflict).

Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: fix wrong fallback logic for FDIR

When adding a FDIR filter, if ice_vc_fdir_set_irq_ctx returns failure,
the inserted fdir entry will not be removed and if ice_vc_fdir_write_fltr
returns failure, the fdir context info for irq handler will not be cleared
which may lead to inconsistent or memory leak issue. This patch refines
failure cases to resolve this issue.

Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Simei Su <simei.su@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

NFSD: callback request does not use correct credential for AUTH_SYS

Currently callback request does not use the credential specified in
CREATE_SESSION if the security flavor for the back channel is AUTH_SYS.

Problem was discovered by pynfs 4.1 DELEG5 and DELEG7 test with error:
DELEG5 st_delegation.testCBSecParms : FAILURE
expected callback with uid, gid == 17, 19, got 0, 0

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: 8276c902bbe9 ("SUNRPC: remove uid and gid from struct auth_cred")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFS: Remove "select RPCSEC_GSS_KRB5

If CONFIG_CRYPTO=n (e.g. arm/shmobile_defconfig):

   WARNING: unmet direct dependencies detected for RPCSEC_GSS_KRB5
     Depends on [n]: NETWORK_FILESYSTEMS [=y] && SUNRPC [=y] && CRYPTO [=n]
     Selected by [y]:
     - NFS_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFS_FS [=y]

As NFSv4 can work without crypto enabled, remove the RPCSEC_GSS_KRB5
dependency altogether.

Trond says:
> It is possible to use the NFSv4.1 client with just AUTH_SYS, and
> in fact there are plenty of people out there using only that. The
> fact that RFC5661 gets its knickers in a twist about RPCSEC_GSS
> support is largely irrelevant to those people.
>
> The other issue is that ’select’ enforces the strict dependency
> that if the NFS client is compiled into the kernel, then the
> RPCSEC_GSS and kerberos code needs to be compiled in as well: they
> cannot exist as modules.

Fixes: e57d06527738 ("NFS & NFSD: Update GSS dependencies")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

sunrpc: only free unix grouplist after RCU settles

While the unix_gid object is rcu-freed, the group_info list that it
contains is not. Ensure that we only put the group list reference once
we are really freeing the unix_gid object.

Reported-by: Zhi Li <yieli@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2183056
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Fixes: fd5d2f78261b ("SUNRPC: Make server side AUTH_UNIX use lockless lookups")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

net: stmmac: fix up RX flow hash indirection table when setting channels

stmmac_reinit_queues() fails to fix up the RX hash.  Even if the number
of channels gets restricted, the output of `ethtool -x' indicates that
all RX queues are used:

  $ ethtool -l enp0s29f2
  Channel parameters for enp0s29f2:
  Pre-set maximums:
  RX: 8
  TX: 8
  Other: n/a
  Combined: n/a
  Current hardware settings:
  RX: 8
  TX: 8
  Other: n/a
  Combined: n/a
  $ ethtool -x enp0s29f2
  RX flow hash indirection table for enp0s29f2 with 8 RX ring(s):
      0:      0     1     2     3     4     5     6     7
      8:      0     1     2     3     4     5     6     7
  [...]
  $ ethtool -L enp0s29f2 rx 3
  $ ethtool -x enp0s29f2
  RX flow hash indirection table for enp0s29f2 with 3 RX ring(s):
      0:      0     1     2     3     4     5     6     7
      8:      0     1     2     3     4     5     6     7
  [...]

Fix this by setting the indirection table according to the number
of specified queues.  The result is now as expected:

  $ ethtool -L enp0s29f2 rx 3
  $ ethtool -x enp0s29f2
  RX flow hash indirection table for enp0s29f2 with 3 RX ring(s):
      0:      0     1     2     0     1     2     0     1
      8:      2     0     1     2     0     1     2     0
  [...]

Tested on Intel Elkhart Lake.

Fixes: 0366f7e06a6b ("net: stmmac: add ethtool support for get/set channels")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Link: https://lore.kernel.org/r/20230403121120.489138-1-vinschen@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

iommufd: Do not corrupt the pfn list when doing batch carry

If batch->end is 0 then setting npfns[0] before computing the new value of
pfns will fail to adjust the pfn and result in various page accounting
corruptions. It should be ordered after.

This seems to result in various kinds of page meta-data corruption related
failures:

  WARNING: CPU: 1 PID: 527 at mm/gup.c:75 try_grab_folio+0x503/0x740
  Modules linked in:
  CPU: 1 PID: 527 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  RIP: 0010:try_grab_folio+0x503/0x740
  Code: e3 01 48 89 de e8 6d c1 dd ff 48 85 db 0f 84 7c fe ff ff e8 4f bf dd ff 49 8d 47 ff 48 89 45 d0 e9 73 fe ff ff e8 3d bf dd ff <0f> 0b 31 db e9 d0 fc ff ff e8 2f bf dd ff 48 8b 5d c8 31 ff 48 89
  RSP: 0018:ffffc90000f37908 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 00000000fffffc02 RCX: ffffffff81504c26
  RDX: 0000000000000000 RSI: ffff88800d030000 RDI: 0000000000000002
  RBP: ffffc90000f37948 R08: 000000000003ca24 R09: 0000000000000008
  R10: 000000000003ca00 R11: 0000000000000023 R12: ffffea000035d540
  R13: 0000000000000001 R14: 0000000000000000 R15: ffffea000035d540
  FS:  00007fecbf659740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000200011c3 CR3: 000000000ef66006 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   <TASK>
   internal_get_user_pages_fast+0xd32/0x2200
   pin_user_pages_fast+0x65/0x90
   pfn_reader_user_pin+0x376/0x390
   pfn_reader_next+0x14a/0x7b0
   pfn_reader_first+0x140/0x1b0
   iopt_area_fill_domain+0x74/0x210
   iopt_table_add_domain+0x30e/0x6e0
   iommufd_device_selftest_attach+0x7f/0x140
   iommufd_test+0x10ff/0x16f0
   iommufd_fops_ioctl+0x206/0x330
   __x64_sys_ioctl+0x10e/0x160
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

Cc: <stable@vger.kernel.org>
Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages")
Link: https://lore.kernel.org/r/3-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Fix unpinning of pages when an access is present

syzkaller found that the calculation of batch_last_index should use
'start_index' since at input to this function the batch is either empty or
it has already been adjusted to cross any accesses so it will start at the
point we are unmapping from.

Getting this wrong causes the unmap to run over the end of the pages
which corrupts pages that were never mapped. In most cases this triggers
the num pinned debugging:

  WARNING: CPU: 0 PID: 557 at drivers/iommu/iommufd/pages.c:294 __iopt_area_unfill_domain+0x152/0x560
  Modules linked in:
  CPU: 0 PID: 557 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  RIP: 0010:__iopt_area_unfill_domain+0x152/0x560
  Code: d2 0f ff 44 8b 64 24 54 48 8b 44 24 48 31 ff 44 89 e6 48 89 44 24 38 e8 fc d3 0f ff 45 85 e4 0f 85 eb 01 00 00 e8 0e d2 0f ff <0f> 0b e8 07 d2 0f ff 48 8b 44 24 38 89 5c 24 58 89 18 8b 44 24 54
  RSP: 0018:ffffc9000108baf0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffffffff821e3f85
  RDX: 0000000000000000 RSI: ffff88800faf0000 RDI: 0000000000000002
  RBP: ffffc9000108bd18 R08: 000000000003ca25 R09: 0000000000000014
  R10: 000000000003ca00 R11: 0000000000000024 R12: 0000000000000004
  R13: 0000000000000801 R14: 00000000000007ff R15: 0000000000000800
  FS:  00007f3499ce1740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000243 CR3: 00000000179c2001 CR4: 0000000000770ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   iopt_area_unfill_domain+0x32/0x40
   iopt_table_remove_domain+0x23f/0x4c0
   iommufd_device_selftest_detach+0x3a/0x90
   iommufd_selftest_destroy+0x55/0x70
   iommufd_object_destroy_user+0xce/0x130
   iommufd_destroy+0xa2/0xc0
   iommufd_fops_ioctl+0x206/0x330
   __x64_sys_ioctl+0x10e/0x160
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

Also add some useful WARN_ON sanity checks.

Cc: <stable@vger.kernel.org>
Fixes: 8d160cd4d506 ("iommufd: Algorithms for PFN storage")
Link: https://lore.kernel.org/r/2-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Check for uptr overflow

syzkaller found that setting up a map with a user VA that wraps past zero
can trigger WARN_ONs, particularly from pin_user_pages weirdly returning 0
due to invalid arguments.

Prevent creating a pages with a uptr and size that would math overflow.

  WARNING: CPU: 0 PID: 518 at drivers/iommu/iommufd/pages.c:793 pfn_reader_user_pin+0x2e6/0x390
  Modules linked in:
  CPU: 0 PID: 518 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  RIP: 0010:pfn_reader_user_pin+0x2e6/0x390
  Code: b1 11 e9 25 fe ff ff e8 28 e4 0f ff 31 ff 48 89 de e8 2e e6 0f ff 48 85 db 74 0a e8 14 e4 0f ff e9 4d ff ff ff e8 0a e4 0f ff <0f> 0b bb f2 ff ff ff e9 3c ff ff ff e8 f9 e3 0f ff ba 01 00 00 00
  RSP: 0018:ffffc90000f9fa30 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff821e2b72
  RDX: 0000000000000000 RSI: ffff888014184680 RDI: 0000000000000002
  RBP: ffffc90000f9fa78 R08: 00000000000000ff R09: 0000000079de6f4e
  R10: ffffc90000f9f790 R11: ffff888014185418 R12: ffffc90000f9fc60
  R13: 0000000000000002 R14: ffff888007879800 R15: 0000000000000000
  FS:  00007f4227555740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000043 CR3: 000000000e748005 CR4: 0000000000770ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   pfn_reader_next+0x14a/0x7b0
   ? interval_tree_double_span_iter_update+0x11a/0x140
   pfn_reader_first+0x140/0x1b0
   iopt_pages_rw_slow+0x71/0x280
   ? __this_cpu_preempt_check+0x20/0x30
   iopt_pages_rw_access+0x2b2/0x5b0
   iommufd_access_rw+0x19f/0x2f0
   iommufd_test+0xd11/0x16f0
   ? write_comp_data+0x2f/0x90
   iommufd_fops_ioctl+0x206/0x330
   __x64_sys_ioctl+0x10e/0x160
   ? __pfx_iommufd_fops_ioctl+0x10/0x10
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

Cc: <stable@vger.kernel.org>
Fixes: 8d160cd4d506 ("iommufd: Algorithms for PFN storage")
Link: https://lore.kernel.org/r/1-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe

In the am65_cpsw_nuss_probe() function's cleanup path, the call to
of_platform_device_destroy() for the common->mdio_dev device is invoked
unconditionally. It is possible that either the MDIO node is not present
in the device-tree, or the MDIO node is disabled in the device-tree. In
both these cases, the MDIO device is not created, resulting in a NULL
pointer dereference when the of_platform_device_destroy() function is
invoked on the common->mdio_dev device on the cleanup path.

Fix this by ensuring that the common->mdio_dev device exists, before
attempting to invoke of_platform_device_destroy().

Fixes: a45cfcc69a25 ("net: ethernet: ti: am65-cpsw-nuss: use of_platform_device_create() for mdio")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230403090321.835877-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'vfs.misc.fixes.v6.3-rc6' of git://git./linux/kernel/git/vfs/idmapping

Pull vfs fix from Christian Brauner:
"When a mount or mount tree is made shared the vfs allocates new peer
  group ids for all mounts that have no peer group id set. Only mounts
  that aren't marked with MNT_SHARED are relevant here as MNT_SHARED
  indicates that the mount has fully transitioned to a shared mount. The
  peer group id handling is done with namespace lock held.

  On failure, the peer group id settings of mounts for which a new peer
  group id was allocated need to be reverted and the allocated peer
  group id freed. The cleanup_group_ids() helper can identify the mounts
  to cleanup by checking whether a given mount has a peer group id set
  but isn't marked MNT_SHARED. The deallocation always needs to happen
  with namespace lock held to protect against concurrent modifications
  of the propagation settings.

  This fixes the one place where the namespace lock was dropped before
  calling cleanup_group_ids()"

* tag 'vfs.misc.fixes.v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: drop peer group ids under namespace lock

Merge tag 'hyperv-fixes-signed-20230402' of git://git./linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

- Fix a bug in channel allocation for VMbus (Mohammed Gamal)

- Do not allow root partition functionality in CVM (Michael Kelley)

* tag 'hyperv-fixes-signed-20230402' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Block root partition functionality in a Confidential VM
Drivers: vmbus: Check for channel allocation before looking up relids

tracing: Error if a trace event has an array for a __field()

A __field() in the TRACE_EVENT() macro is used to set up the fields of the
trace event data. It is for single storage units (word, char, int,
pointer, etc) and not for complex structures or arrays. Unfortunately,
there's nothing preventing the build from accepting:

__field(int, arr[5]);

from building. It will turn into a array value. This use to work fine, as
the offset and size use to be determined by the macro using the field name,
but things have changed and the offset and size are now determined by the
type. So the above would only be size 4, and the next field will be
located 4 bytes from it (instead of 20).

The proper way to declare static arrays is to use the __array() macro.

Instead of __field(int, arr[5]) it should be __array(int, arr, 5).

Add some macro tricks to the building of a trace event from the
TRACE_EVENT() macro such that __field(int, arr[5]) will fail to build. A
comment by the failure will explain why the build failed.

Link: https://lore.kernel.org/lkml/20230306122549.236561-1-douglas.raillard@arm.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230309221302.642e82d9@gandalf.local.home
Reported-by: Douglas RAILLARD <douglas.raillard@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing/osnoise: Fix notify new tracing_max_latency

osnoise/timerlat tracers are reporting new max latency on instances
where the tracing is off, creating inconsistencies between the max
reported values in the trace and in the tracing_max_latency. Thus
only report new tracing_max_latency on active tracing instances.

Link: https://lkml.kernel.org/r/ecd109fde4a0c24ab0f00ba1e9a144ac19a91322.1680104184.git.bristot@kernel.org
Cc: stable@vger.kernel.org
Fixes: dae181349f1e ("tracing/osnoise: Support a list of trace_array *tr")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/timerlat: Notify new max thread latency

timerlat is not reporting a new tracing_max_latency for the thread
latency. The reason is that it is not calling notify_new_max_latency()
function after the new thread latency is sampled.

Call notify_new_max_latency() after computing the thread latency.

Link: https://lkml.kernel.org/r/16e18d61d69073d0192ace07bf61e405cca96e9c.1680104184.git.bristot@kernel.org
Cc: stable@vger.kernel.org
Fixes: dae181349f1e ("tracing/osnoise: Support a list of trace_array *tr")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Mark get_lock_parent_ip() __always_inline

If the compiler decides not to inline this function then preemption
tracing will always show an IP inside the preemption disabling path and
never the function actually calling preempt_{enable,disable}.

Link: https://lore.kernel.org/linux-trace-kernel/20230327173647.1690849-1-john@metanate.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: f904f58263e1d ("sched/debug: Fix preempt_disable_ip recording for preempt_disable()")
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ring-buffer: Fix race while reader and writer are on the same page

When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.

  Call trace:
   rb_get_reader_page+0x248/0x1300
   rb_buffer_peek+0x34/0x160
   ring_buffer_peek+0xbc/0x224
   peek_next_entry+0x98/0xbc
   __find_next_entry+0xc4/0x1c0
   trace_find_next_entry_inc+0x30/0x94
   tracing_read_pipe+0x198/0x304
   vfs_read+0xb4/0x1e0
   ksys_read+0x74/0x100
   __arm64_sys_read+0x24/0x30
   el0_svc_common.constprop.0+0x7c/0x1bc
   do_el0_svc+0x2c/0x94
   el0_svc+0x20/0x30
   el0_sync_handler+0xb0/0xb4
   el0_sync+0x160/0x180

Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page->read is obviously abnormal:
  tail_page == commit_page == reader_page == {
    .write = 0x100d20,
    .read = 0x8f9f4805,  // Far greater than 0xd20, obviously abnormal!!!
    .entries = 0x10004c,
    .real_end = 0x0,
    .page = {
      .time_stamp = 0x857257416af0,
      .commit = 0xd20,  // This page hasn't been full filled.
      // .data[0...0xd20] seems normal.
    }
}

The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.

To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit a0fcaaed0c46 ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.

Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes: 77ae365eca89 ("ring-buffer: make lockless")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/synthetic: Fix races on freeing last_cmd

Currently, the "last_cmd" variable can be accessed by multiple processes
asynchronously when multiple users manipulate synthetic_events node
at the same time, it could lead to use-after-free or double-free.

This patch add "lastcmd_mutex" to prevent "last_cmd" from being accessed
asynchronously.

================================================================

It's easy to reproduce in the KASAN environment by running the two
scripts below in different shells.

script 1:
        while :
        do
                echo -n -e '\x88' > /sys/kernel/tracing/synthetic_events
        done

script 2:
        while :
        do
                echo -n -e '\xb0' > /sys/kernel/tracing/synthetic_events
        done

================================================================
double-free scenario:

    process A                       process B
-------------------               ---------------
1.kstrdup last_cmd
                                  2.free last_cmd
3.free last_cmd(double-free)

================================================================
use-after-free scenario:

    process A                       process B
-------------------               ---------------
1.kstrdup last_cmd
                                  2.free last_cmd
3.tracing_log_err(use-after-free)

================================================================

Appendix 1. KASAN report double-free:

BUG: KASAN: double-free in kfree+0xdc/0x1d4
Free of addr ***** by task sh/4879
Call trace:
        ...
        kfree+0xdc/0x1d4
        create_or_delete_synth_event+0x60/0x1e8
        trace_parse_run_command+0x2bc/0x4b8
        synth_events_write+0x20/0x30
        vfs_write+0x200/0x830
        ...

Allocated by task 4879:
        ...
        kstrdup+0x5c/0x98
        create_or_delete_synth_event+0x6c/0x1e8
        trace_parse_run_command+0x2bc/0x4b8
        synth_events_write+0x20/0x30
        vfs_write+0x200/0x830
        ...

Freed by task 5464:
        ...
        kfree+0xdc/0x1d4
        create_or_delete_synth_event+0x60/0x1e8
        trace_parse_run_command+0x2bc/0x4b8
        synth_events_write+0x20/0x30
        vfs_write+0x200/0x830
        ...

================================================================
Appendix 2. KASAN report use-after-free:

BUG: KASAN: use-after-free in strlen+0x5c/0x7c
Read of size 1 at addr ***** by task sh/5483
sh: CPU: 7 PID: 5483 Comm: sh
        ...
        __asan_report_load1_noabort+0x34/0x44
        strlen+0x5c/0x7c
        tracing_log_err+0x60/0x444
        create_or_delete_synth_event+0xc4/0x204
        trace_parse_run_command+0x2bc/0x4b8
        synth_events_write+0x20/0x30
        vfs_write+0x200/0x830
        ...

Allocated by task 5483:
        ...
        kstrdup+0x5c/0x98
        create_or_delete_synth_event+0x80/0x204
        trace_parse_run_command+0x2bc/0x4b8
        synth_events_write+0x20/0x30
        vfs_write+0x200/0x830
        ...

Freed by task 5480:
        ...
        kfree+0xdc/0x1d4
        create_or_delete_synth_event+0x74/0x204
        trace_parse_run_command+0x2bc/0x4b8
        synth_events_write+0x20/0x30
        vfs_write+0x200/0x830
        ...

Link: https://lore.kernel.org/linux-trace-kernel/20230321110444.1587-1-Tze-nan.Wu@mediatek.com
Fixes: 27c888da9867 ("tracing: Remove size restriction on synthetic event cmd error logging")
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: "Tom Zanussi" <zanussi@kernel.org>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

gpio: davinci: Add irq chip flag to skip set wake

Add the IRQCHIP_SKIP_SET_WAKE flag since there are no special IRQ Wake
bits that can be set to enable wakeup IRQ.

Fixes: 3d9edf09d452 ("[ARM] 4457/2: davinci: GPIO support")
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

gpio: davinci: Do not clear the bank intr enable bit in save_context

The interrupt enable bits might be set if we want to use the GPIO as
wakeup source. Clearing this will mean disabling of interrupts in the GPIO
banks that we may want to wakeup from.
Thus remove the line that was clearing this bit from the driver's save
context function.

Cc: Devarsh Thakkar <devarsht@ti.com>
Fixes: 0651a730924b ("gpio: davinci: Add support for system suspend/resume PM")
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

wifi: mt76: ignore key disable commands

This helps avoid cleartext leakage of already queued or powersave buffered
packets, when a reassoc triggers the key deletion.

Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230330091259.61378-1-nbd@nbd.name

wifi: ath11k: reduce the MHI timeout to 20s

Currently ath11k breaks after hibernation, the reason being that ath11k expects
that the wireless device will have power during suspend and the firmware will
continue running. But of course during hibernation the power from the device is
cut off and firmware is not running when resuming, so ath11k will fail.

(The reason why ath11k needs the firmware running is the interaction between
mac80211 and MHI stack, it's a long story and more info in the bugzilla report.)

In SUSE kernels the watchdog timeout is reduced from the default 120 to 60 seconds:

CONFIG_DPM_WATCHDOG_TIMEOUT=60

But as the ath11k MHI timeout is 90 seconds the kernel will crash before will
ath11k will recover in resume callback. To avoid the crash reduce the MHI
timeout to just 20 seconds.

Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9

Link: https://bugzilla.kernel.org/show_bug.cgi?id=214649
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230329162038.8637-1-kvalo@kernel.org

platform/x86: thinkpad_acpi: Add missing T14s Gen1 type to s2idle quirk list

From the commit message adding the first s2idle quirks:

> Lenovo laptops that contain NVME SSDs across a variety of generations have
> trouble resuming from suspend to idle when the IOMMU translation layer is
> active for the NVME storage device.
>
> This generally manifests as a large resume delay or page faults. These
> delays and page faults occur as a result of a Lenovo BIOS specific SMI
> that runs during the D3->D0 transition on NVME devices.

Add the DMI ids for another variant of the T14s Gen1, which also needs
the s2idle quirk.

Link: https://lore.kernel.org/all/20220503183420.348-1-mario.limonciello@amd.com/
Link: https://bbs.archlinux.org/viewtopic.php?pid=2084655#p2084655
Signed-off-by: Benjamin Asbach <asbachb.kernel@impl.it>
Tested-by: Benjamin Asbach <asbachb.kernel@impl.it>
Link: https://lore.kernel.org/r/20230331232447.37204-1-asbachb.kernel@impl.it
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

platform/x86: think-lmi: Clean up display of current_value on Thinkstation

On ThinkStations on retrieving the attribute value the BIOS appends the
possible values to the string.
Clean up the display in the current_value_show function so the options
part is not displayed.

Fixes: a40cd7ef22fb ("platform/x86: think-lmi: Add WMI interface support on Lenovo platforms")
Reported by Mario Limoncello <Mario.Limonciello@amd.com>
Link: https://github.com/fwupd/fwupd/issues/5077#issuecomment-1488730526
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20230403013120.2105-2-mpearson-lenovo@squebb.ca
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings

My previous commit introduced a memory leak where the item allocated
from tlmi_setting was not freed.
This commit also renames it to avoid confusion with the similarly name
variable in the same function.

Fixes: 8a02d70679fc ("platform/x86: think-lmi: Add possible_values for ThinkStation")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Link: https://lore.kernel.org/lkml/df26ff45-8933-f2b3-25f4-6ee51ccda7d8@gmx.de/T/
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20230403013120.2105-1-mpearson-lenovo@squebb.ca
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

platform/x86: think-lmi: Fix memory leak when showing current settings

When retriving a item string with tlmi_setting(), the result has to be
freed using kfree(). In current_value_show() however, malformed
item strings are not freed, causing a memory leak.
Fix this by eliminating the early return responsible for this.

Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Link: https://lore.kernel.org/platform-driver-x86/01e920bc-5882-ba0c-dd15-868bf0eca0b8@alu.unizg.hr/T/#t
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Fixes: 0fdf10e5fc96 ("platform/x86: think-lmi: Split current_value to reflect only the value")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230331213319.41040-1-W_Armin@gmx.de
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

ipv6: Fix an uninit variable access bug in __ip6_make_skb()

Syzbot reported a bug as following:

=====================================================
BUG: KMSAN: uninit-value in arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
BUG: KMSAN: uninit-value in arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
BUG: KMSAN: uninit-value in atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
BUG: KMSAN: uninit-value in __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
__ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
ip6_finish_skb include/net/ipv6.h:1122 [inline]
ip6_push_pending_frames+0x10e/0x550 net/ipv6/ip6_output.c:1987
rawv6_push_pending_frames+0xb12/0xb90 net/ipv6/raw.c:579
rawv6_sendmsg+0x297e/0x2e60 net/ipv6/raw.c:922
inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
__sys_sendmsg net/socket.c:2559 [inline]
__do_sys_sendmsg net/socket.c:2568 [inline]
__se_sys_sendmsg net/socket.c:2566 [inline]
__x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Uninit was created at:
slab_post_alloc_hook mm/slab.h:766 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
__kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
__do_kmalloc_node mm/slab_common.c:967 [inline]
__kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
kmalloc_reserve net/core/skbuff.c:492 [inline]
__alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
alloc_skb include/linux/skbuff.h:1270 [inline]
__ip6_append_data+0x51c1/0x6bb0 net/ipv6/ip6_output.c:1684
ip6_append_data+0x411/0x580 net/ipv6/ip6_output.c:1854
rawv6_sendmsg+0x2882/0x2e60 net/ipv6/raw.c:915
inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
__sys_sendmsg net/socket.c:2559 [inline]
__do_sys_sendmsg net/socket.c:2568 [inline]
__se_sys_sendmsg net/socket.c:2566 [inline]
__x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

It is because icmp6hdr does not in skb linear region under the scenario
of SOCK_RAW socket. Access icmp6_hdr(skb)->icmp6_type directly will
trigger the uninit variable access bug.

Use a local variable icmp6_type to carry the correct value in different
scenarios.

Fixes: 14878f75abd5 ("[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2]")
Reported-by: syzbot+8257f4dcef79de670baf@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=3d605ec1d0a7f2a269a1a6936ac7f2b85975ee9c
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT

On the remote side, when QRTR socket is removed, af_qrtr will call
qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours
including local NS. NS upon receiving the DEL_CLIENT packet, will remove
the lookups associated with the node:port and broadcasts the DEL_SERVER
packet.

But on the host side, due to the arrival of the DEL_CLIENT packet, the NS
would've already deleted the server belonging to that port. So when the
remote's NS again broadcasts the DEL_SERVER for that port, it throws below
error message on the host:

"failed while handling packet from 2:-2"

So fix this error by not broadcasting the DEL_SERVER packet when the
DEL_CLIENT packet gets processed."

Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace")
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Ram Kumar Dharuman <quic_ramd@quicinc.com>
Signed-off-by: Sricharan Ramabadhran <quic_srichara@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sfp: add quirk enabling 2500Base-x for HG MXPD-483II

The HG MXPD-483II 1310nm SFP module is meant to operate with 2500Base-X,
however, in their EEPROM they incorrectly specify:
    Transceiver type                          : Ethernet: 1000BASE-LX
    ...
    BR, Nominal                               : 2600MBd

Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway.

https://forum.banana-pi.org/t/bpi-r3-sfp-module-compatibility/14573/60

Reported-by: chowtom <chowtom@gmail.com>
Tested-by: chowtom <chowtom@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

drm/i915: Fix context runtime accounting

When considering whether to mark one context as stopped and another as
started we need to look at whether the previous and new _contexts_ are
different and not just requests. Otherwise the software tracked context
start time was incorrectly updated to the most recent lite-restore time-
stamp, which was in some cases resulting in active time going backward,
until the context switch (typically the heartbeat pulse) would synchronise
with the hardware tracked context runtime. Easiest use case to observe
this behaviour was with a full screen clients with close to 100% engine
load.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: bb6287cb1886 ("drm/i915: Track context current active time")
Cc: <stable@vger.kernel.org> # v5.19+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230320151423.1708436-1-tvrtko.ursulin@linux.intel.com
[tursulin: Fix spelling in commit msg.]
(cherry picked from commit b3e70051879c665acdd3a1ab50d0ed58d6a8001f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>