review.tizen.org Git - platform/kernel/linux-rpi.git/log

livepatch: Use kallsyms_on_each_match_symbol() to improve performance

Based on the test results of kallsyms_on_each_match_symbol() and
kallsyms_on_each_symbol(), the average performance can be improved by
more than 1500 times.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

kallsyms: Add helper kallsyms_on_each_match_symbol()

Function kallsyms_on_each_symbol() traverses all symbols and submits each
symbol to the hook 'fn' for judgment and processing. For some cases, the
hook actually only handles the matched symbol, such as livepatch.

Because all symbols are currently sorted by name, all the symbols with the
same name are clustered together. Function kallsyms_lookup_names() gets
the start and end positions of the set corresponding to the specified
name. So we can easily and quickly traverse all the matches.

The test results are as follows (twice): (x86)
kallsyms_on_each_match_symbol: 7454, 7984
kallsyms_on_each_symbol : 11733809, 11785803

kallsyms_on_each_match_symbol() consumes only 0.066% of
kallsyms_on_each_symbol()'s time. In other words, 1523x better
performance.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[]

kallsyms_seqs_of_names[] records the symbol index sorted by address, the
maximum value in kallsyms_seqs_of_names[] is the number of symbols. And
2^24 = 16777216, which means that three bytes are enough to store the
index. This can help us save (1 * kallsyms_num_syms) bytes of memory.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

kallsyms: Correctly sequence symbols when CONFIG_LTO_CLANG=y

LLVM appends various suffixes for local functions and variables, suffixes
observed:
- foo.llvm.[0-9a-f]+
- foo.[0-9a-f]+

Therefore, when CONFIG_LTO_CLANG=y, kallsyms_lookup_name() needs to
truncate the suffix of the symbol name before comparing the local function
or variable name.

Old implementation code:
- if (strcmp(namebuf, name) == 0)
- return kallsyms_sym_address(i);
- if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
- return kallsyms_sym_address(i);

The preceding process is traversed by address from low to high. That is,
for those with the same name after the suffix is removed, the one with
the smallest address is returned first. Therefore, when sorting in the
tool, if the raw names are the same, they should be sorted by address in
ascending order.

ASCII[.]   = 2e
ASCII[0-9] = 30,39
ASCII[A-Z] = 41,5a
ASCII[_]   = 5f
ASCII[a-z] = 61,7a

According to the preceding ASCII code values, the following sorting result
is strictly followed.
---------------------------------
|    main-key     |    sub-key    |
|---------------------------------|
|                 |  addr_lowest  |
| <name>          |      ...      |
| <name>.<suffix> |      ...      |
|                 |  addr_highest |
|---------------------------------|
| <name>?<others> |               |   //? is [_A-Za-z0-9]
---------------------------------

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

kallsyms: Improve the performance of kallsyms_lookup_name()

Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. It's O(n).

If we sort names in ascending order like addresses, we can also use
binary search. It's O(log(n)).

In order not to change the implementation of "/proc/kallsyms", the table
kallsyms_names[] is still stored in a one-to-one correspondence with the
address in ascending order.

Add array kallsyms_seqs_of_names[], it's indexed by the sequence number
of the sorted names, and the corresponding content is the sequence number
of the sorted addresses. For example:
Assume that the index of NameX in array kallsyms_seqs_of_names[] is 'i',
the content of kallsyms_seqs_of_names[i] is 'k', then the corresponding
address of NameX is kallsyms_addresses[k]. The offset in kallsyms_names[]
is get_symbol_offset(k).

Note that the memory usage will increase by (4 * kallsyms_num_syms)
bytes, the next two patches will reduce (1 * kallsyms_num_syms) bytes
and properly handle the case CONFIG_LTO_CLANG=y.

Performance test results: (x86)
Before:
min=234, max=10364402, avg=5206926
min=267, max=11168517, avg=5207587
After:
min=1016, max=90894, avg=7272
min=1014, max=93470, avg=7293

The average lookup performance of kallsyms_lookup_name() improved 715x.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

scripts/kallsyms: rename build_initial_tok_table()

Except for the function build_initial_tok_table(), no token abbreviation
is used elsewhere.

$ cat scripts/kallsyms.c | grep tok | wc -l
33
$ cat scripts/kallsyms.c | grep token | wc -l
31

Here, it would be clearer to use the full name.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

module: Fix NULL vs IS_ERR checking for module_get_next_page

The module_get_next_page() function return error pointers on error
instead of NULL.
Use IS_ERR() to check the return value to fix this.

Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

kernel/params.c: defer most of param_sysfs_init() to late_initcall time

param_sysfs_init(), and in particular param_sysfs_builtin() is rather
time-consuming; for my board, it currently takes about 30ms.

That amounts to about 3% of the time budget I have from U-Boot hands
over control to linux and linux must assume responsibility for keeping
the external watchdog happy.

We must still continue to initialize module_kset at subsys_initcall
time, since otherwise any request_module() would fail in
mod_sysfs_init(). However, the bulk of the work in
param_sysfs_builtin(), namely populating /sys/module/*/version and/or
/sys/module/*/parameters/ for builtin modules, can be deferred to
late_initcall time - there's no userspace yet anyway to observe
contents of /sys or the lack thereof.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

module: Remove unused macros module_addr_min/max

Unused macros reported by [-Wunused-macros].

These macros are introduced to record the bound address of modules.

Commit 80b8bf436990 ("module: Always have struct mod_tree_root") made
"struct mod_tree_root" always present and its members addr_min and
addr_max can be directly accessed.

Macros module_addr_min and module_addr_min are not used anymore, so remove
them.

Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mcgrof: massaged the commit messsage as suggested by Miroslav]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

module: remove redundant module_sysfs_initialized variable

The variable module_sysfs_initialized is used for checking whether
module_kset has been initialized. Checking module_kset itself works
just fine for that.

This is a leftover from commit 7405c1e15edf ("kset: convert /sys/module
to use kset_create").

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
[mcgrof: adjusted commit log as suggested by Christophe Leroy]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Merge tag 'perf-tools-fixes-for-v6.1-2-2022-11-10' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix 'perf stat' crash with --per-node --metric-only in CSV mode, due
   to the AGGR_NODE slot in the 'aggr_header_csv' array not being set.

- Fix printing prefix in CSV output of 'perf stat' metrics in interval
   mode (-I), where an extra separator was being added to the start of
   some lines.

- Fix skipping branch stack sampling 'perf test' entry, that was using
   both --branch-any and --branch-filter, which can't be used together.

* tag 'perf-tools-fixes-for-v6.1-2-2022-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Add the include/perf/ directory to .gitignore
  perf test: Fix skipping branch stack sampling test
  perf stat: Fix printing os->prefix in CSV metrics output
  perf stat: Fix crash with --per-node --metric-only in CSV mode

Merge tag 'riscv-for-linus-6.1-rc5' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- A fix to add the missing PWM LEDs into the SiFive HiFive Unleashed
   device tree.

- A fix to fully clear a task's registers on creation, as they end up
   in userspace and thus leak kernel memory.

- A pair of VDSO-related build fixes that manifest on recent LLVM-based
   toolchains.

- A fix to our early init to ensure the DT is adequately processed
   before reserved memory nodes are processed.

* tag 'riscv-for-linus-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: vdso: Do not add missing symbols to version section in linker script
  riscv: fix reserved memory setup
  riscv: vdso: fix build with llvm
  riscv: process: fix kernel info leakage
  riscv: dts: sifive unleashed: Add PWM controlled LEDs

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm
"This is a pretty large diffstat for this time of the release. The main
  culprit is a reorganization of the AMD assembly trampoline, allowing
  percpu variables to be accessed early.

  This is needed for the return stack depth tracking retbleed mitigation
  that will be in 6.2, but it also makes it possible to tighten the IBRS
  restore on vmexit. The latter change is a long tail of the
  spectrev2/retbleed patches (the corresponding Intel change was simpler
  and went in already last June), which is why I am including it right
  now instead of sharing a topic branch with tip.

  Being assembly and being rich in comments makes the line count balloon
  a bit, but I am pretty confident in the change (famous last words)
  because the reorganization actually makes everything simpler and more
  understandable than before. It has also had external review and has
  been tested on the aforementioned 6.2 changes, which explode quite
  brutally without the fix.

  Apart from this, things are pretty normal.

  s390:

   - PCI fix

   - PV clock fix

  x86:

   - Fix clash between PMU MSRs and other MSRs

   - Prepare SVM assembly trampoline for 6.2 retbleed mitigation and
     for...

   - ... tightening IBRS restore on vmexit, moving it before the first
     RET or indirect branch

   - Fix log level for VMSA dump

   - Block all page faults during kvm_zap_gfn_range()

  Tools:

   - kvm_stat: fix incorrect detection of debugfs

   - kvm_stat: update vmexit definitions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Block all page faults during kvm_zap_gfn_range()
  KVM: x86/pmu: Limit the maximum number of supported AMD GP counters
  KVM: x86/pmu: Limit the maximum number of supported Intel GP counters
  KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet
  KVM: SVM: Only dump VMSA to klog at KERN_DEBUG level
  tools/kvm_stat: update exit reasons for vmx/svm/aarch64/userspace
  tools/kvm_stat: fix incorrect detection of debugfs
  x86, KVM: remove unnecessary argument to x86_virt_spec_ctrl and callers
  KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly
  KVM: SVM: restore host save area from assembly
  KVM: SVM: move guest vmsave/vmload back to assembly
  KVM: SVM: do not allocate struct svm_cpu_data dynamically
  KVM: SVM: remove dead field from struct svm_cpu_data
  KVM: SVM: remove unused field from struct vcpu_svm
  KVM: SVM: retrieve VMCB from assembly
  KVM: SVM: adjust register allocation for __svm_vcpu_run()
  KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm
  KVM: x86: use a separate asm-offsets.c file
  KVM: s390: pci: Fix allocation size of aift kzdev elements
  KVM: s390: pv: don't allow userspace to set the clock under PV

Merge tag 'hyperv-fixes-signed-20221110' of git://git./linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

- Fix TSC MSR write for root partition (Anirudh Rayabharam)

- Fix definition of vector in pci-hyperv driver (Dexuan Cui)

- A few other misc patches

* tag 'hyperv-fixes-signed-20221110' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  PCI: hv: Fix the definition of vector in hv_compose_msi_msg()
  MAINTAINERS: remove sthemmin
  x86/hyperv: fix invalid writes to MSRs during root partition kexec
  clocksource/drivers/hyperv: add data structure for reference TSC MSR
  Drivers: hv: fix repeated words in comments
  x86/hyperv: Remove BUG_ON() for kmap_local_page()

Merge tag 'dmaengine-fix-6.1' of git://git./linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
"Misc minor driver fixes and a big pile of at_hdmac driver fixes. More
  work on this driver is done and sitting in next:

   - Pile of at_hdmac driver rework which fixes many long standing
     issues for this driver.

   - couple of stm32 driver fixes for clearing structure and race fix

   - idxd fixes for RO device state and batch size

   - ti driver mem leak fix

   - apple fix for grabbing channels in xlate

   - resource leak fix in mv xor"

* tag 'dmaengine-fix-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (24 commits)
  dmaengine: at_hdmac: Check return code of dma_async_device_register
  dmaengine: at_hdmac: Fix impossible condition
  dmaengine: at_hdmac: Don't allow CPU to reorder channel enable
  dmaengine: at_hdmac: Fix completion of unissued descriptor in case of errors
  dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware
  dmaengine: at_hdmac: Fix concurrency over the active list
  dmaengine: at_hdmac: Free the memset buf without holding the chan lock
  dmaengine: at_hdmac: Fix concurrency over descriptor
  dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()
  dmaengine: at_hdmac: Protect atchan->status with the channel lock
  dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all
  dmaengine: at_hdmac: Fix premature completion of desc in issue_pending
  dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending
  dmaengine: at_hdmac: Don't start transactions at tx_submit level
  dmaengine: at_hdmac: Fix at_lli struct definition
  dmaengine: stm32-dma: fix potential race between pause and resume
  dmaengine: ti: k3-udma-glue: fix memory leak when register device fail
  dmaengine: mv_xor_v2: Fix a resource leak in mv_xor_v2_remove()
  dmaengine: apple-admac: Fix grabbing of channels in of_xlate
  dmaengine: idxd: fix RO device state error after been disabled/reset
  ...

Merge tag 'spi-fix-v6.1-rc4' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A relatively large batch of fixes here but all device specific, plus
  an update to MAINTAINERS.

  The summary print change to the STM32 driver is fixing an issue where
  the driver could easily end up spamming the logs with something that
  should be a debug message"

* tag 'spi-fix-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: amd: Fix SPI_SPD7 value
  spi: stm32: fix stm32_spi_prepare_mbr() that halves spi clk for every run
  spi: meson-spicc: fix do_div build error on non-arm64
  spi: intel: Use correct mask for flash and protected regions
  spi: mediatek: Fix package division error
  spi: tegra210-quad: Don't initialise DMA if not supported
  MAINTAINERS: Update HiSilicon SFC Driver maintainer
  spi: meson-spicc: move wait completion in driver to take bursts delay in account
  spi: stm32: Print summary 'callbacks suppressed' message

Merge tag 'mmc-v6.1-rc4' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:

- Provide helper for resetting both SDHCI and CQHCI

- Fix reset for CQHCI (am654, brcmstb, esdhc-imx, of-arasan, tegra)

- Fixup support for MMC_CAP_8_BIT_DATA (esdhc-imx)

* tag 'mmc-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-esdhc-imx: use the correct host caps for MMC_CAP_8_BIT_DATA
  mmc: sdhci_am654: Fix SDHCI_RESET_ALL for CQHCI
  mmc: sdhci-tegra: Fix SDHCI_RESET_ALL for CQHCI
  mms: sdhci-esdhc-imx: Fix SDHCI_RESET_ALL for CQHCI
  mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI
  mmc: sdhci-of-arasan: Fix SDHCI_RESET_ALL for CQHCI
  mmc: cqhci: Provide helper for resetting both SDHCI and CQHCI

Merge tag 'for-linus-2022111101' of git://git./linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- fix for memory leak (on error path) in Hyper-V driver (Yang
   Yingliang)

- regression fix for handling 3rd barrel switch emulation in Wacom
   driver (Jason Gerecke)

* tag 'for-linus-2022111101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: wacom: Fix logic used for 3rd barrel switch emulation
  HID: hyperv: fix possible memory leak in mousevsc_probe()
  HID: asus: Remove unused variable in asus_report_tool_width()

Merge tag 'sound-6.1-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Things look calming down, as this contains only a few small fixes:

   - Fix for a corner-case bug with SG-buffer page allocation helper

   - A regression fix for Roland USB-audio device probe

   - A potential memory leak fix at the error path

   - Handful quirks and device-specific fixes for HD- and USB-audio"

* tag 'sound-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: fix potential memleak in 'add_widget_node'
  ALSA: memalloc: Don't fall back for SG-buffer with IOMMU
  ALSA: usb-audio: add quirk to fix Hamedal C20 disconnect issue
  ALSA: hda/realtek: Add Positivo C6300 model quirk
  ALSA: usb-audio: Add DSD support for Accuphase DAC-60
  ALSA: usb-audio: Add quirk entry for M-Audio Micro
  ALSA: hda/hdmi - enable runtime pm for more AMD display audio
  ALSA: usb-audio: Remove redundant workaround for Roland quirk
  ALSA: usb-audio: Yet more regression for for the delayed card registration
  ALSA: hda/ca0132: add quirk for EVGA Z390 DARK
  ALSA: hda: clarify comments on SCF changes
  ALSA: arm: pxa: pxa2xx-ac97-lib: fix return value check of platform_get_irq()
  ALSA: hda/realtek: Add quirk for ASUS Zenbook using CS35L41

Merge tag 'drm-fixes-2022-11-11' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Weekly pull request for graphics, mostly amdgpu and i915, with a
  couple of fixes for vc4 and panfrost, panel quirks and a kconfig
  change for rcar-du. Nothing seems to be too strange at this stage.

  amdgpu:
   - Fix s/r in amdgpu_vram_mgr_new
   - SMU 13.0.4 update
   - GPUVM TLB race fix
   - DCN 3.1.4 fixes
   - DCN 3.2.x fixes
   - Vega10 fan fix
   - BACO fix for Beige Goby board
   - PSR fix
   - GPU VM PT locking fixes

  amdkfd:
   - CRIU fixes

  vc4:
   - HDMI fixes to vc4.

  panfrost:
   - Make panfrost's uapi header compile with C++.
   - Handle 1 gb boundary correctly in panfrost mmu code.

  panel:
   - Add rotation quirks for 2 panels.

  rcar-du:
   - DSI Kconfig fix

  i915:
   - Fix sg_table handling in map_dma_buf
   - Send PSR update also on invalidate
   - Do not set cache_dirty for DGFX
   - Restore userptr probe_range behaviour"

* tag 'drm-fixes-2022-11-11' of git://anongit.freedesktop.org/drm/drm: (29 commits)
  drm/amd/display: only fill dirty rectangles when PSR is enabled
  drm/amdgpu: disable BACO on special BEIGE_GOBY card
  drm/amdgpu: Drop eviction lock when allocating PT BO
  drm/amdgpu: Unlock bo_list_mutex after error handling
  Revert "drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly""
  drm/amd/display: Enforce minimum prefetch time for low memclk on DCN32
  drm/amd/display: Fix gpio port mapping issue
  drm/amd/display: Fix reg timeout in enc314_enable_fifo
  drm/amd/display: Fix FCLK deviation and tool compile issues
  drm/amd/display: Zeromem mypipe heap struct before using it
  drm/amd/display: Update SR watermarks for DCN314
  drm/amdgpu: workaround for TLB seq race
  drm/amdkfd: Fix error handling in criu_checkpoint
  drm/amdkfd: Fix error handling in kfd_criu_restore_events
  drm/amd/pm: update SMU IP v13.0.4 msg interface header
  drm: rcar-du: Fix Kconfig dependency between RCAR_DU and RCAR_MIPI_DSI
  drm/panfrost: Split io-pgtable requests properly
  drm/amdgpu: Fix the lpfn checking condition in drm buddy
  drm: panel-orientation-quirks: Add quirk for Acer Switch V 10 (SW5-017)
  drm: panel-orientation-quirks: Add quirk for Nanote UMPC-01
  ...

KVM: x86/mmu: Block all page faults during kvm_zap_gfn_range()

When zapping a GFN range, pass 0 => ALL_ONES for the to-be-invalidated
range to effectively block all page faults while the zap is in-progress.
The invalidation helpers take a host virtual address, whereas zapping a
GFN obviously provides a guest physical address and with the wrong unit
of measurement (frame vs. byte).

Alternatively, KVM could walk all memslots to get the associated HVAs,
but thanks to SMM, that would require multiple lookups. And practically
speaking, kvm_zap_gfn_range() usage is quite rare and not a hot path,
e.g. MTRR and CR0.CD are almost guaranteed to be done only on vCPU0
during boot, and APICv inhibits are similarly infrequent operations.

Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range")
Reported-by: Chao Peng <chao.p.peng@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221111001841.2412598-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'net-6.1-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wifi, can and bpf.

  Current release - new code bugs:

   - can: af_can: can_exit(): add missing dev_remove_pack() of
     canxl_packet

  Previous releases - regressions:

   - bpf, sockmap: fix the sk->sk_forward_alloc warning

   - wifi: mac80211: fix general-protection-fault in
     ieee80211_subif_start_xmit()

   - can: af_can: fix NULL pointer dereference in can_rx_register()

   - can: dev: fix skb drop check, avoid o-o-b access

   - nfnetlink: fix potential dead lock in nfnetlink_rcv_msg()

  Previous releases - always broken:

   - bpf: fix wrong reg type conversion in release_reference()

   - gso: fix panic on frag_list with mixed head alloc types

   - wifi: brcmfmac: fix buffer overflow in brcmf_fweh_event_worker()

   - wifi: mac80211: set TWT Information Frame Disabled bit as 1

   - eth: macsec offload related fixes, make sure to clear the keys from
     memory

   - tun: fix memory leaks in the use of napi_get_frags

   - tun: call napi_schedule_prep() to ensure we own a napi

   - tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent

   - ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to
     network

   - tipc: fix a msg->req tlv length check

   - sctp: clear out_curr if all frag chunks of current msg are pruned,
     avoid list corruption

   - mctp: fix an error handling path in mctp_init(), avoid leaks"

* tag 'net-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
  eth: sp7021: drop free_netdev() from spl2sw_init_netdev()
  MAINTAINERS: Move Vivien to CREDITS
  net: macvlan: fix memory leaks of macvlan_common_newlink
  ethernet: tundra: free irq when alloc ring failed in tsi108_open()
  net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open()
  ethernet: s2io: disable napi when start nic failed in s2io_card_up()
  net: atlantic: macsec: clear encryption keys from the stack
  net: phy: mscc: macsec: clear encryption keys when freeing a flow
  stmmac: dwmac-loongson: fix missing of_node_put() while module exiting
  stmmac: dwmac-loongson: fix missing pci_disable_device() in loongson_dwmac_probe()
  stmmac: dwmac-loongson: fix missing pci_disable_msi() while module exiting
  cxgb4vf: shut down the adapter when t4vf_update_port_info() failed in cxgb4vf_open()
  mctp: Fix an error handling path in mctp_init()
  stmmac: intel: Update PCH PTP clock rate from 200MHz to 204.8MHz
  net: cxgb3_main: disable napi when bind qsets failed in cxgb_up()
  net: cpsw: disable napi in cpsw_ndo_open()
  iavf: Fix VF driver counting VLAN 0 filters
  ice: Fix spurious interrupt during removal of trusted VF
  net/mlx5e: TC, Fix slab-out-of-bounds in parse_tc_actions
  net/mlx5e: E-Switch, Fix comparing termination table instance
  ...

Merge tag 'mlx5-fixes-2022-11-09' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2022-11-02

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2022-11-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: TC, Fix slab-out-of-bounds in parse_tc_actions
  net/mlx5e: E-Switch, Fix comparing termination table instance
  net/mlx5e: TC, Fix wrong rejection of packet-per-second policing
  net/mlx5e: Fix tc acts array not to be dependent on enum order
  net/mlx5e: Fix usage of DMA sync API
  net/mlx5e: Add missing sanity checks for max TX WQE size
  net/mlx5: fw_reset: Don't try to load device in case PCI isn't working
  net/mlx5: E-switch, Set to legacy mode if failed to change switchdev mode
  net/mlx5: Allow async trigger completion execution on single CPU systems
  net/mlx5: Bridge, verify LAG state when adding bond to bridge
====================

Link: https://lore.kernel.org/r/20221109184050.108379-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-11-09 (ice, iavf)

This series contains updates to ice and iavf drivers.

Norbert stops disabling VF queues that are not enabled for ice driver.

Michal stops accounting of VLAN 0 filter to match expectations of PF
driver for iavf.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix VF driver counting VLAN 0 filters
ice: Fix spurious interrupt during removal of trusted VF
====================

Link: https://lore.kernel.org/r/20221110003744.201414-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: sp7021: drop free_netdev() from spl2sw_init_netdev()

It's not necessary to free netdev allocated with devm_alloc_etherdev()
and using free_netdev() leads to double free.

Fixes: fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20221109150116.2988194-1-weiyongjun@huaweicloud.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'drm-intel-fixes-2022-11-10' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix sg_table handling in map_dma_buf (Matthew Auld)
- Send PSR update also on invalidate (Jouni Högander)
- Do not set cache_dirty for DGFX (Niranjana Vishwanathapura)
- Restore userptr probe_range behaviour (Matthew Auld)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y2zCy5q85qE9W0J8@tursulin-desk

RISC-V: vdso: Do not add missing symbols to version section in linker script

Recently, ld.lld moved from '--undefined-version' to
'--no-undefined-version' as the default, which breaks the compat vDSO
build:

  ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_gettimeofday' failed: symbol not defined
  ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_gettime' failed: symbol not defined
  ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_getres' failed: symbol not defined

These symbols are not present in the compat vDSO or the regular vDSO for
32-bit but they are unconditionally included in the version section of
the linker script, which is prohibited with '--no-undefined-version'.

Fix this issue by only including the symbols that are actually exported
in the version section of the linker script.

Link: https://github.com/ClangBuiltLinux/linux/issues/1756
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221108171324.3377226-1-nathan@kernel.org/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

riscv: fix reserved memory setup

Currently, RISC-V sets up reserved memory using the "early" copy of the
device tree. As a result, when trying to get a reserved memory region
using of_reserved_mem_lookup(), the pointer to reserved memory regions
is using the early, pre-virtual-memory address which causes a kernel
panic when trying to use the buffer's name:

Unable to handle kernel paging request at virtual address 00000000401c31ac
Oops [#1]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
epc : string+0x4a/0xea
  ra : vsnprintf+0x1e4/0x336
epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
  gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
  t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
  s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
  a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
  a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
  s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
  s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
  s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
  s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
  t5 : ffffffff812f3618 t6 : ffffffff81203d08
status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
[<ffffffff80338936>] vsnprintf+0x1e4/0x336
[<ffffffff80055ae2>] vprintk_store+0xf6/0x344
[<ffffffff80055d86>] vprintk_emit+0x56/0x192
[<ffffffff80055ed8>] vprintk_default+0x16/0x1e
[<ffffffff800563d2>] vprintk+0x72/0x80
[<ffffffff806813b2>] _printk+0x36/0x50
[<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
[<ffffffff808057ec>] paging_init+0x528/0x5bc
[<ffffffff808031ae>] setup_arch+0xd0/0x592
[<ffffffff8080070e>] start_kernel+0x82/0x73c

early_init_fdt_scan_reserved_mem() takes no arguments as it operates on
initial_boot_params, which is populated by early_init_dt_verify(). On
RISC-V, early_init_dt_verify() is called twice. Once, directly, in
setup_arch() if CONFIG_BUILTIN_DTB is not enabled and once indirectly,
very early in the boot process, by parse_dtb() when it calls
early_init_dt_scan_nodes().

This first call uses dtb_early_va to set initial_boot_params, which is
not usable later in the boot process when
early_init_fdt_scan_reserved_mem() is called. On arm64 for example, the
corresponding call to early_init_dt_scan_nodes() uses fixmap addresses
and doesn't suffer the same fate.

Move early_init_fdt_scan_reserved_mem() further along the boot sequence,
after the direct call to early_init_dt_verify() in setup_arch() so that
the names use the correct virtual memory addresses. The above supposed
that CONFIG_BUILTIN_DTB was not set, but should work equally in the case
where it is - unflatted_and_copy_device_tree() also updates
initial_boot_params.

Reported-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/linux-riscv/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/r/20221107151524.3941467-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

Merge tag 'drm-fixes-20221109' of git://linuxtv.org/pinchartl/media into drm-fixes

R-Car DSI Kconfig dependency fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y2u8+uM4A006XRPh@pendragon.ideasonboard.com

riscv: vdso: fix build with llvm

Even after commit 89fd4a1df829 ("riscv: jump_label: mark arguments as
const to satisfy asm constraints"), building with CC_OPTIMIZE_FOR_SIZE
+ LLVM=1 can reproduce below build error:

  CC      arch/riscv/kernel/vdso/vgettimeofday.o
In file included from <built-in>:4:
In file included from lib/vdso/gettimeofday.c:5:
In file included from include/vdso/datapage.h:17:
In file included from include/vdso/processor.h:10:
In file included from arch/riscv/include/asm/vdso/processor.h:7:
In file included from include/linux/jump_label.h:112:
arch/riscv/include/asm/jump_label.h:42:3: error:
invalid operand for inline asm constraint 'i'
                "       .option push                            \n\t"
                ^
1 error generated.

I think the problem is when "-Os" is passed as CFLAGS, it's removed by
"CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os" which is
introduced in commit e05d57dcb8c7 ("riscv: Fixup __vdso_gettimeofday
broke dynamic ftrace"), thus no optimization at all for vgettimeofday.c
arm64 does remove "-Os" as well, but it forces "-O2" after removing
"-Os".

I compared the generated vgettimeofday.o with "-O2" and "-Os",
I think no big performance difference. So let's tell the kbuild not
to remove "-Os" rather than follow arm64 style.

vdso related performance can be improved a lot when building kernel with
CC_OPTIMIZE_FOR_SIZE after this commit, ("-Os" VS no optimization)

Fixes: e05d57dcb8c7 ("riscv: Fixup __vdso_gettimeofday broke dynamic ftrace")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221031182943.2453-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

MAINTAINERS: Move Vivien to CREDITS

Last patch from Vivien was nearly 3 years ago and he has not reviewed or
responded to DSA patches since then, move to CREDITS.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221109231907.621678-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

riscv: process: fix kernel info leakage

thread_struct's s[12] may contain random kernel memory content, which
may be finally leaked to userspace. This is a security hole. Fix it
by clearing the s[12] array in thread_struct when fork.

As for kthread case, it's better to clear the s[12] array as well.

Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221029113450.4027-1-jszhang@kernel.org
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/CAJF2gTSdVyAaM12T%2B7kXAdRPGS4VyuO08X1c7paE-n4Fr8OtRA@mail.gmail.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

Merge tag 'drm-misc-fixes-2022-11-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v6.1-rc5:
- HDMI fixes to vc4.
- Make panfrost's uapi header compile with C++.
- Add rotation quirks for 2 panels.
- Fix s/r in amdgpu_vram_mgr_new
- Handle 1 gb boundary correctly in panfrost mmu code.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e02de501-4b85-28a0-3f6e-751ca13f5f9d@linux.intel.com

Merge tag 'for-6.1-rc4-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- revert memory optimization for scrub blocks, this misses errors in
   2nd and following blocks

- add exception for ENOMEM as reason for transaction abort to not print
   stack trace, syzbot has reported many

- zoned fixes:
      - fix locking imbalance during scrub
      - initialize zones for seeding device
      - initialize zones for cloned device structures

- when looking up device, change assertion to a real check as some of
   the search parameters can be passed by ioctl, reported by syzbot

- fix error pointer check in self tests

* tag 'for-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: fix locking imbalance on scrub
  btrfs: zoned: initialize device's zone info for seeding
  btrfs: zoned: clone zoned device info when cloning a device
  Revert "btrfs: scrub: use larger block size for data extent scrub"
  btrfs: don't print stack trace when transaction is aborted due to ENOMEM
  btrfs: selftests: fix wrong error check in btrfs_free_dummy_root()
  btrfs: fix match incorrectly in dev_args_match_device

Merge tag 'soundwire-6.1-fixes' of git://git./linux/kernel/git/vkoul/soundwire

Pull soundwire fixes from Vinod Koul:
"Two qcom driver fixes for broadcast completion reinit and check for
  outanding writes. And a lone Intel driver fix for clock stop timeout"

* tag 'soundwire-6.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: qcom: check for outanding writes before doing a read
  soundwire: qcom: reinit broadcast completion
  soundwire: intel: Initialize clock stop timeout

Merge tag 'phy-fixes-6.1' of git://git./linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:
"A bunch of odd driver fixes and a MAINTAINER email update:

   - Update Kishon's email

   - stms32 error code fix in driver probe

   - tegra: fix for checking valid pointer

   - qcom_qmp: null deref fix

   - sunplus: error check fix

   - ralink: add missing sentinel to table"

* tag 'phy-fixes-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: ralink: mt7621-pci: add sentinel to quirks table
  phy: sunplus: Fix an IS_ERR() vs NULL bug in sp_usb_phy_probe
  phy: qcom-qmp-combo: fix NULL-deref on runtime resume
  phy: tegra: xusb: Fix crash during pad power on/down
  phy: stm32: fix an error code in probe
  MAINTAINERS: Update Kishon's email address in GENERIC PHY FRAMEWORK

Merge tag 'hwlock-v6.1' of git://git./linux/kernel/git/remoteproc/linux

Pull hwspinlock updates from Bjorn Andersson:
"I apparently had missed tagging and sending this set of changes out
  during the 6.1 merge window. But did get the associated dts changes
  depending on this merged. The result is a regression in 6.1-rc on the
  affected, older, Qualcomm platforms - in for form of them not booting.

  So while these weren't regression fixes originally, they are now. It's
  not introducing new beahavior, but simply extending the existing new
  Devicetree model, to cover remaining platforms:

   - extend the DeviceTree binding and implementation for the Qualcomm
     hardware spinlock on some older platforms to follow the style of
     the newer ones where the DeviceTree representation does not rely on
     an intermediate syscon node"

* tag 'hwlock-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  dt-bindings: hwlock: qcom-hwspinlock: add syscon to MSM8974
  hwspinlock: qcom: add support for MMIO on older SoCs
  hwspinlock: qcom: correct MMIO max register for newer SoCs
  dt-bindings: hwlock: qcom-hwspinlock: correct example indentation
  dt-bindings: hwlock: qcom-hwspinlock: add support for MMIO on older SoCs

net: macvlan: fix memory leaks of macvlan_common_newlink

kmemleak reports memory leaks in macvlan_common_newlink, as follows:

ip link add link eth0 name .. type macvlan mode source macaddr add
<MAC-ADDR>

kmemleak reports:

unreferenced object 0xffff8880109bb140 (size 64):
  comm "ip", pid 284, jiffies 4294986150 (age 430.108s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b8 aa 5a 12 80 88 ff ff  ..........Z.....
    80 1b fa 0d 80 88 ff ff 1e ff ac af c7 c1 6b 6b  ..............kk
  backtrace:
    [<ffffffff813e06a7>] kmem_cache_alloc_trace+0x1c7/0x300
    [<ffffffff81b66025>] macvlan_hash_add_source+0x45/0xc0
    [<ffffffff81b66a67>] macvlan_changelink_sources+0xd7/0x170
    [<ffffffff81b6775c>] macvlan_common_newlink+0x38c/0x5a0
    [<ffffffff81b6797e>] macvlan_newlink+0xe/0x20
    [<ffffffff81d97f8f>] __rtnl_newlink+0x7af/0xa50
    [<ffffffff81d98278>] rtnl_newlink+0x48/0x70
    ...

In the scenario where the macvlan mode is configured as 'source',
macvlan_changelink_sources() will be execured to reconfigure list of
remote source mac addresses, at the same time, if register_netdevice()
return an error, the resource generated by macvlan_changelink_sources()
is not cleaned up.

Using this patch, in the case of an error, it will execute
macvlan_flush_sources() to ensure that the resource is cleaned up.

Fixes: aa5fd0fb7748 ("driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.")
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Link: https://lore.kernel.org/r/20221109090735.690500-1-nashuiliang@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ethernet: tundra: free irq when alloc ring failed in tsi108_open()

When alloc tx/rx ring failed in tsi108_open(), it doesn't free irq. Fix
it.

Fixes: 5e123b844a1c ("[PATCH] Add tsi108/9 On Chip Ethernet device driver support")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109044016.126866-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ALSA: hda: fix potential memleak in 'add_widget_node'

As 'kobject_add' may allocated memory for 'kobject->name' when return error.
And in this function, if call 'kobject_add' failed didn't free kobject.
So call 'kobject_put' to recycling resources.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20221110144539.2989354-1-yebin@huaweicloud.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: memalloc: Don't fall back for SG-buffer with IOMMU

When the non-contiguous page allocation for SG buffer allocation
fails, the memalloc helper tries to fall back to the old page
allocation methods.  This would, however, result in the bogus page
addresses when IOMMU is enabled.  Usually in such a case, the fallback
allocation should fail as well, but occasionally it succeeds and
hitting a bad access.

The fallback was thought for non-IOMMU case, and as the error from
dma_alloc_noncontiguous() with IOMMU essentially implies a fatal
memory allocation error, we should return the error straightforwardly
without fallback.  This avoids the corner case like the above.

The patch also renames the local variable "dma_ops" with snd_ prefix
for avoiding the name conflict.

Fixes: a8d302a0b770 ("ALSA: memalloc: Revive x86-specific WC page allocations again")
Reported-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2211041541090.3532114@eliteleevi.tm.intel.com
Link: https://lore.kernel.org/r/20221110132216.30605-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open()

When failed to init rxq or txq in mv643xx_eth_open() for opening device,
napi isn't disabled. When open mv643xx_eth device next time, it will
trigger a BUG_ON() in napi_enable(). Compile tested only.

Fixes: 2257e05c1705 ("mv643xx_eth: get rid of receive-side locking")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109025432.80900-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ethernet: s2io: disable napi when start nic failed in s2io_card_up()

When failed to start nic or add interrupt service routine in
s2io_card_up() for opening device, napi isn't disabled. When open
s2io device next time, it will trigger a BUG_ON()in napi_enable().
Compile tested only.

Fixes: 5f490c968056 ("S2io: Fixed synchronization between scheduling of napi with card reset and close")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109023741.131552-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'macsec-clear-encryption-keys-in-h-w-drivers'

Antoine Tenart says:

====================
macsec: clear encryption keys in h/w drivers

Commit aaab73f8fba4 ("macsec: clear encryption keys from the stack after
setting up offload") made sure to clean encryption keys from the stack
after setting up offloading but some h/w drivers did a copy of the key
which need to be zeroed as well.

The MSCC PHY driver can actually be converted not to copy the encryption
key at all, but such patch would be quite difficult to backport. I'll
send a following up patch doing this in net-next once this series lands.

Tested on the MSCC PHY but not on the atlantic NIC.
====================

Link: https://lore.kernel.org/r/20221108153459.811293-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: atlantic: macsec: clear encryption keys from the stack

Commit aaab73f8fba4 ("macsec: clear encryption keys from the stack after
setting up offload") made sure to clean encryption keys from the stack
after setting up offloading, but the atlantic driver made a copy and did
not clear it. Fix this.

[4 Fixes tags below, all part of the same series, no need to split this]

Fixes: 9ff40a751a6f ("net: atlantic: MACSec ingress offload implementation")
Fixes: b8f8a0b7b5cb ("net: atlantic: MACSec ingress offload HW bindings")
Fixes: 27736563ce32 ("net: atlantic: MACSec egress offload implementation")
Fixes: 9d106c6dd81b ("net: atlantic: MACSec egress offload HW bindings")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: mscc: macsec: clear encryption keys when freeing a flow

Commit aaab73f8fba4 ("macsec: clear encryption keys from the stack after
setting up offload") made sure to clean encryption keys from the stack
after setting up offloading, but the MSCC PHY driver made a copy, kept
it in the flow data and did not clear it when freeing a flow. Fix this.

Fixes: 28c5107aa904 ("net: phy: mscc: macsec support")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'stmmac-dwmac-loongson-fixes-three-leaks'

Yang Yingliang says:

====================
stmmac: dwmac-loongson: fixes three leaks

patch #2 fixes missing pci_disable_device() in the error path in probe()
patch #1 and pach #3 fix missing pci_disable_msi() and of_node_put() in
error and remove() path.
====================

Link: https://lore.kernel.org/r/20221108114647.4144952-1-yangyingliang@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

stmmac: dwmac-loongson: fix missing of_node_put() while module exiting

The node returned by of_get_child_by_name() with refcount decremented,
of_node_put() needs be called when finish using it. So add it in the
error path in loongson_dwmac_probe() and in loongson_dwmac_remove().

Fixes: 2ae34111fe4e ("stmmac: dwmac-loongson: fix invalid mdio_node")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

stmmac: dwmac-loongson: fix missing pci_disable_device() in loongson_dwmac_probe()

Add missing pci_disable_device() in the error path in loongson_dwmac_probe().

Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

stmmac: dwmac-loongson: fix missing pci_disable_msi() while module exiting

pci_enable_msi() has been called in loongson_dwmac_probe(),
so pci_disable_msi() needs be called in remove path and error
path of probe().

Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ALSA: usb-audio: add quirk to fix Hamedal C20 disconnect issue

For Hamedal C20, the current rate is different from the runtime rate,
snd_usb_endpoint stop and close endpoint to resetting rate.
if snd_usb_endpoint close the endpoint, sometimes usb will
disconnect the device.

Signed-off-by: Ai Chao <aichao@kylinos.cn>
Link: https://lore.kernel.org/r/20221110063452.295110-1-aichao@kylinos.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge tag 'amd-drm-fixes-6.1-2022-11-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.1-2022-11-09:

amdgpu:
- SMU 13.0.4 update
- GPUVM TLB race fix
- DCN 3.1.4 fixes
- DCN 3.2.x fixes
- Vega10 fan fix
- BACO fix for Beige Goby board
- PSR fix
- GPU VM PT locking fixes

amdkfd:
- CRIU fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221109234554.6028-1-alexander.deucher@amd.com

ALSA: hda/realtek: Add Positivo C6300 model quirk

Positivo Master C6300 (1849:a233) require quirk for anabling headset-mic

Signed-off-by: Edson Juliano Drosdeck <edson.drosdeck@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20221109171732.5417-1-edson.drosdeck@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

cxgb4vf: shut down the adapter when t4vf_update_port_info() failed in cxgb4vf_open()

When t4vf_update_port_info() failed in cxgb4vf_open(), resources applied
during adapter goes up are not cleared. Fix it. Only be compiled, not be
tested.

Fixes: 18d79f721e0a ("cxgb4vf: Update port information in cxgb4vf_open()")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109012100.99132-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mctp: Fix an error handling path in mctp_init()

If mctp_neigh_init() return error, the routes resources should
be released in the error handling path. Otherwise some resources
leak.

Fixes: 4d8b9319282a ("mctp: Add neighbour implementation")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20221108095517.620115-1-weiyongjun@huaweicloud.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stmmac: intel: Update PCH PTP clock rate from 200MHz to 204.8MHz

Current Intel platform has an output of ~976ms interval
when probed on 1 Pulse-per-Second(PPS) hardware pin.

The correct PTP clock frequency for PCH GbE should be 204.8MHz
instead of 200MHz. PSE GbE PTP clock rate remains at 200MHz.

Fixes: 58da0cfa6cf1 ("net: stmmac: create dwmac-intel.c to contain all Intel platform")
Signed-off-by: Ling Pei Lee <pei.lee.ling@intel.com>
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Gan Yi Fang <yi.fang.gan@intel.com>
Link: https://lore.kernel.org/r/20221108020811.12919-1-yi.fang.gan@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: cxgb3_main: disable napi when bind qsets failed in cxgb_up()

When failed to bind qsets in cxgb_up() for opening device, napi isn't
disabled. When open cxgb3 device next time, it will trigger a BUG_ON()
in napi_enable(). Compile tested only.

Fixes: 48c4b6dbb7e2 ("cxgb3 - fix port up/down error path")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109021451.121490-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: cpsw: disable napi in cpsw_ndo_open()

When failed to create xdp rxqs or fill rx channels in cpsw_ndo_open() for
opening device, napi isn't disabled. When open cpsw device next time, it
will report a invalid opcode issue. Compiled tested only.

Fixes: d354eb85d618 ("drivers: net: cpsw: dual_emac: simplify napi usage")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109011537.96975-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drm/amd/display: only fill dirty rectangles when PSR is enabled

Currently, we are calling fill_dc_dirty_rects() even if PSR isn't
supported by the relevant link in amdgpu_dm_commit_planes(), this is
undesirable especially because when drm.debug is enabled we are printing
messages in fill_dc_dirty_rects() that are only useful for debugging PSR
(and confusing otherwise). So, we can instead limit the filling of dirty
rectangles to only when PSR is enabled.

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: disable BACO on special BEIGE_GOBY card

Still avoid intermittent failure.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdgpu: Drop eviction lock when allocating PT BO

Re-take the eviction lock immediately again after the allocation is
completed, to fix circular locking warning with drm_buddy allocator.

Move amdgpu_vm_eviction_lock/unlock/trylock to amdgpu_vm.h as they are
called from multiple files.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Unlock bo_list_mutex after error handling

Get below kernel WARNING backtrace when pressing ctrl-C to kill kfdtest
application.

If amdgpu_cs_parser_bos returns error after taking bo_list_mutex, as
caller amdgpu_cs_ioctl will not unlock bo_list_mutex, this generates the
kernel WARNING.

Add unlock bo_list_mutex after amdgpu_cs_parser_bos error handling to
cleanup bo_list userptr bo.

WARNING: kfdtest/2930 still has locks held!
1 lock held by kfdtest/2930:
  (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0xce5/0x1f10 [amdgpu]
  stack backtrace:
   dump_stack_lvl+0x44/0x57
   get_signal+0x79f/0xd00
   arch_do_signal_or_restart+0x36/0x7b0
   exit_to_user_mode_prepare+0xfd/0x1b0
   syscall_exit_to_user_mode+0x19/0x40
   do_syscall_64+0x40/0x80

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly""

This reverts commit 4545ae2ed3f2f7c3f615a53399c9c8460ee5bca7.

The origin patch "drm/amdgpu: getting fan speed pwm for vega10 properly" works fine.
Test failure is caused by test case self.

Signed-off-by: Asher Song <Asher.Song@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Enforce minimum prefetch time for low memclk on DCN32

[WHY?]
Data return times when using lowest memclk can be <= 60us, which can cause
underflow on high bandwidth displays with a workload.

[HOW?]
Enforce a minimum prefetch time during validation for low memclk modes.

Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix gpio port mapping issue

[Why]
1. Port of gpio has different mapping.

[How]
1. Add a dummy entry in mapping table.
2. Fix incorrect mask bit field access.

Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Steve Su <steve.su@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix reg timeout in enc314_enable_fifo

[Why]
The link enablement sequence can end up resetting the encoder while
the PHY symclk isn't yet on.

This means that waiting for symclk on will timeout, along with the reset
bit never asserting high.

This causes unnecessary delay when enabling the link and produces a
warning affecting multiple IGT tests.

[How]
Don't wait for the symclk to be on here because firmware already does.

Don't wait for reset if we know the symclk isn't on.

Split the reset into a helper function that checks the bit and decides
whether or not a delay is sufficient.

Reviewed-by: Roman Li <Roman.Li@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x

drm/amd/display: Fix FCLK deviation and tool compile issues

[Why]
Recent backports from open source do not have header inclusion pattern
that is consistent with inclusion style in the rest of the file. This
breaks the internal tool builds as well. A recent commit erronously
modified the original DML formula for calculating
ActiveClockChangeLatencyHidingY. This resulted in a FCLK deviation
from the golden values.

[How]
Change the way in which display_mode_vba.h is included so that it is
consistent with the inclusion style in rest of the file which also fixes
the tool build. Restore the DML formula to its original state to fix the
FCLK deviation.

Reviewed-by: Aurabindo Pillai <Aurabindo.Pillai@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Zeromem mypipe heap struct before using it

[Why&How]
Bug was caused when moving variable from stack to heap because it was reusable
and garbage was left over, so we need to zero mem.

Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Martin Leung <Martin.Leung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Update SR watermarks for DCN314

[Why & How]
New values requested by hardware after fine-tuning.
Update for all memory types.

Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x

drm/amdgpu: workaround for TLB seq race

It can happen that we query the sequence value before the callback
had a chance to run.

Workaround that by grabbing the fence lock and releasing it again.
Should be replaced by hw handling soon.

Signed-off-by: Christian König <christian.koenig@amd.com>
CC: stable@vger.kernel.org # 5.19+
Fixes: 5255e146c99a6 ("drm/amdgpu: rework TLB flushing")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Stefan Springer <stefanspr94@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix error handling in criu_checkpoint

Checkpoint BOs last. That way we don't need to close dmabuf FDs if
something else fails later. This avoids problematic access to user mode
memory in the error handling code path.

criu_checkpoint_bos has its own error handling and cleanup that does not
depend on access to user memory.

In the private data, keep BOs before the remaining objects. This is
necessary to restore things in the correct order as restoring events
depends on the events-page BO being restored first.

Fixes: be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
Reported-by: Jann Horn <jannh@google.com>
CC: Rajneesh Bhardwaj <Rajneesh.Bhardwaj@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdkfd: Fix error handling in kfd_criu_restore_events

mutex_unlock before the exit label because all the error code paths that
jump there didn't take that lock. This fixes unbalanced locking errors
in case of restore errors.

Fixes: 40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amd/pm: update SMU IP v13.0.4 msg interface header

Some of the unused messages that were used earlier in development have
been freed up as spare messages, no intended functional changes.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x

iavf: Fix VF driver counting VLAN 0 filters

VF driver mistakenly counts VLAN 0 filters, when no PF driver
counts them.
Do not count VLAN 0 filters, when VLAN_V2 is engaged.
Counting those filters in, will affect filters size by -1, when
sending batched VLAN addition message.

Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Fix spurious interrupt during removal of trusted VF

Previously, during removal of trusted VF when VF is down there was
number of spurious interrupt equal to number of queues on VF.

Add check if VF already has inactive queues. If VF is disabled and
has inactive rx queues then do not disable rx queues.
Add check in ice_vsi_stop_tx_ring if it's VF's vsi and if VF is
disabled.

Fixes: efe41860008e ("ice: Fix memory corruption in VF driver")
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge tag 'slab-for-6.1-rc4-fixes' of git://git./linux/kernel/git/vbabka/slab

Pull slab fixes from Vlastimil Babka:
"Most are small fixups as described below.

  The !CONFIG_TRACING fix is a bit bigger and would normally be done in
  the next merge window as part of upcoming hardening changes. But we
  realized it can make the kmalloc waste tracking introduced in this
  window inaccurate, so decided to go with it now.

  Summary:

   - Remove !CONFIG_TRACING kmalloc() wrappers intended to save a
     function call, due to incompatilibity with recently introduced
     wasted space tracking and planned hardening changes.

   - A tracing parameter regression fix, by Kees Cook.

   - Two kernel-doc warning fixups, by Lukas Bulwahn and myself

* tag 'slab-for-6.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm, slab: remove duplicate kernel-doc comment for ksize()
  mm/slab_common: Restore passing "caller" for tracing
  mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace()
  mm/slab_common: repair kernel-doc for __ksize()

net/mlx5e: TC, Fix slab-out-of-bounds in parse_tc_actions

esw_attr is only allocated if namespace is fdb.

BUG: KASAN: slab-out-of-bounds in parse_tc_actions+0xdc6/0x10e0 [mlx5_core]
Write of size 4 at addr ffff88815f185b04 by task tc/2135

CPU: 5 PID: 2135 Comm: tc Not tainted 6.1.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
print_report+0x170/0x471
? parse_tc_actions+0xdc6/0x10e0 [mlx5_core]
kasan_report+0xbc/0xf0
? parse_tc_actions+0xdc6/0x10e0 [mlx5_core]
parse_tc_actions+0xdc6/0x10e0 [mlx5_core]

Fixes: 94d651739e17 ("net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: E-Switch, Fix comparing termination table instance

The pkt_reformat pointer being saved under flow_act and not
dest attribute in the termination table instance.
Fix the comparison pointers.

Also fix returning success if one pkt_reformat pointer is null
and the other is not.

Fixes: 249ccc3c95bd ("net/mlx5e: Add support for offloading traffic from uplink to uplink")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, Fix wrong rejection of packet-per-second policing

In the bellow commit, we added support for PPS policing without
removing the check which block offload of such cases.
Fix it by removing this check.

Fixes: a8d52b024d6d ("net/mlx5e: TC, Support offloading police action")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Fix tc acts array not to be dependent on enum order

The tc acts array should not be dependent on kernel internal
flow action id enum. Fix the array initialization.

Fixes: fad547906980 ("net/mlx5e: Add tc action infrastructure")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Fix usage of DMA sync API

DMA sync functions should use the same direction that was used by DMA
mapping. Use DMA_BIDIRECTIONAL for XDP_TX from regular RQ, which reuses
the same mapping that was used for RX, and DMA_TO_DEVICE for XDP_TX from
XSK RQ and XDP_REDIRECT, which establish a new mapping in this
direction. On the RX side, use the same direction that was used when
setting up the mapping (DMA_BIDIRECTIONAL for XDP, DMA_FROM_DEVICE
otherwise).

Also don't skip sync for device when establishing a DMA_FROM_DEVICE
mapping for RX, as some architectures (ARM) may require invalidating
caches before the device can use the mapping. It doesn't break the
bugfix made in
commit 0b7cfa4082fb ("net/mlx5e: Fix page DMA map/unmap attributes"),
since the bug happened on unmap.

Fixes: 0b7cfa4082fb ("net/mlx5e: Fix page DMA map/unmap attributes")
Fixes: b5503b994ed5 ("net/mlx5e: XDP TX forwarding support")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add missing sanity checks for max TX WQE size

The commit cited below started using the firmware capability for the
maximum TX WQE size. This commit adds an important check to verify that
the driver doesn't attempt to exceed this capability, and also restores
another check mistakenly removed in the cited commit (a WQE must not
exceed the page size).

Fixes: c27bd1718c06 ("net/mlx5e: Read max WQEBBs on the SQ from firmware")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: fw_reset: Don't try to load device in case PCI isn't working

In case PCI reads fail after unload, there is no use in trying to
load the device.

Fixes: 5ec697446f46 ("net/mlx5: Add support for devlink reload action fw activate")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-switch, Set to legacy mode if failed to change switchdev mode

No need to rollback to the other mode because probably will fail
again. Just set to legacy mode and clear fdb table created flag.
So that fdb table will not be cleared again.

Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Allow async trigger completion execution on single CPU systems

For a single CPU system, the kernel thread executing mlx5_cmd_flush()
never releases the CPU but calls down_trylock(&cmd→sem) in a busy loop.
On a single processor system, this leads to a deadlock as the kernel
thread which executes mlx5_cmd_invoke() never gets scheduled. Fix this,
by adding the cond_resched() call to the loop, allow the command
completion kernel thread to execute.

Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free")
Signed-off-by: Alexander Schmidt <alexschm@de.ibm.com>
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Bridge, verify LAG state when adding bond to bridge

Mlx5 LAG is initialized asynchronously on a workqueue which means that for
a brief moment after setting mlx5 UL representors as lower devices of a
bond netdevice the LAG itself is not fully initialized in the driver. When
adding such bond device to a bridge mlx5 bridge code will not consider it
as offload-capable, skip creating necessary bookkeeping and fail any
further bridge offload-related commands with it (setting VLANs, offloading
FDBs, etc.). In order to make the error explicit during bridge
initialization stage implement the code that detects such condition during
NETDEV_PRECHANGEUPPER event and returns an error.

Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Merge tag 'kvm-s390-master-6.1-1' of https://git./linux/kernel/git/kvms390/linux into HEAD

A PCI allocation fix and a PV clock fix.

KVM: x86/pmu: Limit the maximum number of supported AMD GP counters

The AMD PerfMonV2 specification allows for a maximum of 16 GP counters,
but currently only 6 pairs of MSRs are accepted by KVM.

While AMD64_NUM_COUNTERS_CORE is already equal to 6, increasing without
adjusting msrs_to_save_all[] could result in out-of-bounds accesses.
Therefore introduce a macro (named KVM_AMD_PMC_MAX_GENERIC) to
refer to the number of counters supported by KVM.

Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220919091008.60695-3-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/pmu: Limit the maximum number of supported Intel GP counters

The Intel Architectural IA32_PMCx MSRs addresses range allows for a
maximum of 8 GP counters, and KVM cannot address any more. Introduce a
local macro (named KVM_INTEL_PMC_MAX_GENERIC) and use it consistently to
refer to the number of counters supported by KVM, thus avoiding possible
out-of-bound accesses.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220919091008.60695-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet

The SDM lists an architectural MSR IA32_CORE_CAPABILITIES (0xCF)
that limits the theoretical maximum value of the Intel GP PMC MSRs
allocated at 0xC1 to 14; likewise the Intel April 2022 SDM adds
IA32_OVERCLOCKING_STATUS at 0x195 which limits the number of event
selection MSRs to 15 (0x186-0x194).

Limiting the maximum number of counters to 14 or 18 based on the currently
allocated MSRs is clearly fragile, and it seems likely that Intel will
even place PMCs 8-15 at a completely different range of MSR indices.
So stop at the maximum number of GP PMCs supported today on Intel
processors.

There are some machines, like Intel P4 with non Architectural PMU, that
may indeed have 18 counters, but those counters are in a completely
different MSR address range and are not supported by KVM.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Fixes: cf05a67b68b8 ("KVM: x86: omit "impossible" pmu MSRs from MSR list")
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220919091008.60695-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: Only dump VMSA to klog at KERN_DEBUG level

Explicitly print the VMSA dump at KERN_DEBUG log level, KERN_CONT uses
KERNEL_DEFAULT if the previous log line has a newline, i.e. if there's
nothing to continuing, and as a result the VMSA gets dumped when it
shouldn't.

The KERN_CONT documentation says it defaults back to KERNL_DEFAULT if the
previous log line has a newline. So switch from KERN_CONT to
print_hex_dump_debug().

Jarkko pointed this out in reference to the original patch. See:
https://lore.kernel.org/all/YuPMeWX4uuR1Tz3M@kernel.org/
print_hex_dump(KERN_DEBUG, ...) was pointed out there, but
print_hex_dump_debug() should similar.

Fixes: 6fac42f127b8 ("KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog")
Signed-off-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Harald Hoyer <harald@profian.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Message-Id: <20221104142220.469452-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tools/kvm_stat: update exit reasons for vmx/svm/aarch64/userspace

Update EXIT_REASONS from source, including VMX_EXIT_REASONS,
SVM_EXIT_REASONS, AARCH64_EXIT_REASONS, USERSPACE_EXIT_REASONS.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_00082C8BFA925A65E11570F417F1CD404505@qq.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tools/kvm_stat: fix incorrect detection of debugfs

The first field in /proc/mounts can be influenced by unprivileged users
through the widespread `fusermount` setuid-root program. Example:

```
user$ mkdir ~/mydebugfs
user$ export _FUSE_COMMFD=0
user$ fusermount ~/mydebugfs -ononempty,fsname=debugfs
user$ grep debugfs /proc/mounts
debugfs /home/user/mydebugfs fuse rw,nosuid,nodev,relatime,user_id=1000,group_id=100 0 0
```

If there is no debugfs already mounted in the system then this can be
used by unprivileged users to trick kvm_stat into using a user
controlled file system location for obtaining KVM statistics.
Even though the root user is not allowed to access non-root FUSE mounts
for security reasons, the unprivileged user can unmount the FUSE mount
before kvm_stat uses the mounted path. If it wins the race, kvm_stat
will read from the location where the FUSE mount resided.

Note that the files in debugfs are only opened for reading, so the
attacker can cause very large data to be read in by kvm_stat, or fake
data to be processed, but there should be no viable way to turn this
into a privilege escalation.

The fix is simply to use the file system type field instead. Whitespace
in the mount path is escaped in /proc/mounts thus no further safety
measures in the parsing should be necessary to make this correct.

Message-Id: <20221103135927.13656-1-matthias.gerstner@suse.de>
Signed-off-by: Matthias Gerstner <matthias.gerstner@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

x86, KVM: remove unnecessary argument to x86_virt_spec_ctrl and callers

x86_virt_spec_ctrl only deals with the paravirtualized
MSR_IA32_VIRT_SPEC_CTRL now and does not handle MSR_IA32_SPEC_CTRL
anymore; remove the corresponding, unused argument.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly

Restoration of the host IA32_SPEC_CTRL value is probably too late
with respect to the return thunk training sequence.

With respect to the user/kernel boundary, AMD says, "If software chooses
to toggle STIBP (e.g., set STIBP on kernel entry, and clear it on kernel
exit), software should set STIBP to 1 before executing the return thunk
training sequence." I assume the same requirements apply to the guest/host
boundary. The return thunk training sequence is in vmenter.S, quite close
to the VM-exit. On hosts without V_SPEC_CTRL, however, the host's
IA32_SPEC_CTRL value is not restored until much later.

To avoid this, move the restoration of host SPEC_CTRL to assembly and,
for consistency, move the restoration of the guest SPEC_CTRL as well.
This is not particularly difficult, apart from some care to cover both
32- and 64-bit, and to share code between SEV-ES and normal vmentry.

Cc: stable@vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: restore host save area from assembly

Allow access to the percpu area via the GS segment base, which is
needed in order to access the saved host spec_ctrl value. In linux-next
FILL_RETURN_BUFFER also needs to access percpu data.

For simplicity, the physical address of the save area is added to struct
svm_cpu_data.

Cc: stable@vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Analyzed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: move guest vmsave/vmload back to assembly

It is error-prone that code after vmexit cannot access percpu data
because GSBASE has not been restored yet. It forces MSR_IA32_SPEC_CTRL
save/restore to happen very late, after the predictor untraining
sequence, and it gets in the way of return stack depth tracking
(a retbleed mitigation that is in linux-next as of 2022-11-09).

As a first step towards fixing that, move the VMCB VMSAVE/VMLOAD to
assembly, essentially undoing commit fb0c4a4fee5a ("KVM: SVM: move
VMLOAD/VMSAVE to C code", 2021-03-15). The reason for that commit was
that it made it simpler to use a different VMCB for VMLOAD/VMSAVE versus
VMRUN; but that is not a big hassle anymore thanks to the kvm-asm-offsets
machinery and other related cleanups.

The idea on how to number the exception tables is stolen from
a prototype patch by Peter Zijlstra.

Cc: stable@vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Link: <https://lore.kernel.org/all/f571e404-e625-bae1-10e9-449b2eb4cbd8@citrix.com/>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: do not allocate struct svm_cpu_data dynamically

The svm_data percpu variable is a pointer, but it is allocated via
svm_hardware_setup() when KVM is loaded. Unlike hardware_enable()
this means that it is never NULL for the whole lifetime of KVM, and
static allocation does not waste any memory compared to the status quo.
It is also more efficient and more easily handled from assembly code,
so do it and don't look back.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: remove dead field from struct svm_cpu_data

The "cpu" field of struct svm_cpu_data has been write-only since commit
4b656b120249 ("KVM: SVM: force new asid on vcpu migration", 2009-08-05).
Remove it.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: remove unused field from struct vcpu_svm

The pointer to svm_cpu_data in struct vcpu_svm looks interesting from
the point of view of accessing it after vmexit, when the GSBASE is still
containing the guest value. However, despite existing since the very
first commit of drivers/kvm/svm.c (commit 6aa8b732ca01, "[PATCH] kvm:
userspace interface", 2006-12-10), it was never set to anything.

Ignore the opportunity to fix a 16 year old "bug" and delete it; doing
things the "harder" way makes it possible to remove more old cruft.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>