review.tizen.org Git - platform/kernel/linux-starfive.git/log

cxl/mbox: Add missing parameter to docs.

Kernel-doc should be complete, so add documentation for the status
parameter.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230130153437.3153-1-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/test: Simulate event log overflow

Log overflow is marked by a separate trace message.

Simulate a log with lots of messages and flag overflow until space is
cleared.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-8-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/test: Add specific events

Each type of event has different trace point outputs.

Add mock General Media Event, DRAM event, and Memory Module Event
records to the mock list of events returned.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-7-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/test: Add generic mock events

Facilitate testing basic Get/Clear Event functionality by creating
multiple logs and generic events with made up UUID's.

Data is completely made up with data patterns which should be easy to
spot in trace output.

A single sysfs entry resets the event data and triggers collecting the
events for testing.

Test traces are easy to obtain with a small script such as this:

#!/bin/bash -x

devices=`find /sys/devices/platform -name cxl_mem*`

# Turn on tracing
echo "" > /sys/kernel/tracing/trace
echo 1 > /sys/kernel/tracing/events/cxl/enable
echo 1 > /sys/kernel/tracing/tracing_on

# Generate fake interrupt
for device in $devices; do
echo 1 > $device/event_trigger
done

# Turn off tracing and report events
echo 0 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-6-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/mem: Trace Memory Module Event Record

CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.

Determine if the event read is memory module record and if so trace the
record.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-5-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/mem: Trace DRAM Event Record

CXL rev 3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.

Determine if the event read is a DRAM event record and if so trace the
record.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-4-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/mem: Trace General Media Event Record

CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.

Determine if the event read is a general media record and if so trace
the record as a General Media Event Record.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-3-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/mem: Wire up event interrupts

Currently the only CXL features targeted for irq support require their
message numbers to be within the first 16 entries.  The device may
however support less than 16 entries depending on the support it
provides.

Attempt to allocate these 16 irq vectors.  If the device supports less
then the PCI infrastructure will allocate that number.  Upon successful
allocation, users can plug in their respective isr at any point
thereafter.

CXL device events are signaled via interrupts.  Each event log may have
a different interrupt message number.  These message numbers are
reported in the Get Event Interrupt Policy mailbox command.

Add interrupt support for event logs.  Interrupts are allocated as
shared interrupts.  Therefore, all or some event logs can share the same
message number.

In addition all logs are queried on any interrupt in order of the most
to least severe based on the status register.

Finally place all event configuration logic into cxl_event_config().
Previously the logic was a simple 'read all' on start up.  But
interrupts must be configured prior to any reads to ensure no events are
missed.  A single event configuration function results in a cleaner over
all implementation.

Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-2-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/mem: Read, trace, and clear events on driver load

CXL devices have multiple event logs which can be queried for CXL event
records.  Devices are required to support the storage of at least one
event record in each event log type.

Devices track event log overflow by incrementing a counter and tracking
the time of the first and last overflow event seen.

Software queries events via the Get Event Record mailbox command; CXL
rev 3.0 section 8.2.9.2.2 and clears events via CXL rev 3.0 section
8.2.9.2.3 Clear Event Records mailbox command.

If the result of negotiating CXL Error Reporting Control is OS control,
read and clear all event logs on driver load.

Ensure a clean slate of events by reading and clearing the events on
driver load.

The status register is not used because a device may continue to trigger
events and the only requirement is to empty the log at least once.  This
allows for the required transition from empty to non-empty for interrupt
generation.  Handling of interrupts is in a follow on patch.

The device can return up to 1MB worth of event records per query.
Allocate a shared large buffer to handle the max number of records based
on the mailbox payload size.

This patch traces a raw event record and leaves specific event record
type tracing to subsequent patches.  Macros are created to aid in
tracing the common CXL Event header fields.

Each record is cleared explicitly.  A clear all bit is specified but is
only valid when the log overflows.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-1-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/port: Link the 'parent_dport' in portX/ and endpointX/ sysfs

Similar to the justification in:

1b58b4cac6fc ("cxl/port: Record parent dport when adding ports")

...userspace wants to know the routing information for ports for
calculating the memdev order for region creation among other things.
Cache the information the kernel discovers at enumeration time in a
'parent_dport' attribute to save userspace the time of trawling sysfs
to recover the same information.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167124082375.1626103.6047000000121298560.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/region: Clarify when a cxld->commit() callback is mandatory

Both cxl_switch_decoders() and cxl_endpoint_decoders() are considered by
cxl_region_decode_commit(). Flag cases where cxl_switch_decoders with
multiple targets, or cxl_endpoint_decoders do not have a commit callback
set. The switch case is unlikely to happen since switches are only
enumerated by the CXL core, but the endpoint case may support decoders
defined by drivers outside of drivers/cxl, like accerator drivers.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167124081824.1626103.1555704405392757219.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

tools/testing/cxl: require 64-bit

size_t is limited to 32-bits and so the gen_pool_alloc() using
the size of SZ_64G would map to 0, triggering a low allocation
which is not expected. Force the dependency on 64-bit for cxl_test
as that is what it was designed for.

This issue was found by build test reports when converting this
driver as a proper upstream driver.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20221219195050.325959-1-mcgrof@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/pci: Show opcode in debug messages when sending a command

For debugging it is very helpful to see which commands are sent. Add
it to the debug message.

Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20230103210151.1126873-1-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

tools/testing/cxl: Prevent cxl_test from confusing production modules

The cxl_test machinery builds modified versions of the modules in
drivers/cxl/ and intercepts some of their calls to allow cxl_test to
inject mock CXL topologies for test.

However, if cxl_test attempts the same with production modules,
fireworks ensue as Luis discovered [1]. Prevent that scenario by
arranging for cxl_test to check for a "watermark" symbol in each of the
modules it expects to be modified before the test can run. This turns
undefined runtime behavior or crashes into a safer failure to load the
cxl_test module.

Link: http://lore.kernel.org/r/20221209062919.1096779-1-mcgrof@kernel.org
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/region: Only warn about cpu_cache_invalidate_memregion() once

No need for more than once per module load.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20221215183836.24136-1-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

PCI/CXL: Export native CXL error reporting control

CXL _OSC Error Reporting Control is used by the OS to determine if
Firmware has control of various CXL error reporting capabilities
including the event logs.

Expose the result of negotiating CXL Error Reporting Control in struct
pci_host_bridge for consumption by the CXL drivers.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: linux-pci@vger.kernel.org
Cc: linux-acpi@vger.kernel.org
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20221212070627.1372402-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cxl/pci: Move tracepoint definitions to drivers/cxl/core/

CXL is using tracepoints for reporting RAS capability register payloads
for AER events, and has plans to use tracepoints for the output payload
of Get Poison List and Get Event Records commands. For organization
purposes it would be nice to keep those all under a single + local CXL
trace system. This also organization also potentially helps in the
future when CXL drivers expand beyond generic memory expanders, however
that would also entail a move away from the expander-specific
cxl_dev_state context, save that for later.

Note that the powerpc-specific drivers/misc/cxl/ also defines a 'cxl'
trace system, however, it is unlikely that a single platform will ever
load both drivers simultaneously.

Cc: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167051869176.436579.9728373544811641087.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Linux 6.2-rc2

Merge tag 'perf_urgent_for_v6.2_rc2' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

- Pass only an initialized perf event attribute to the LSM hook

- Fix a use-after-free on the perf syscall's error path

- A potential integer overflow fix in amd_core_pmu_init()

- Fix the cgroup events tracking after the context handling rewrite

- Return the proper value from the inherit_event() function on error

* tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Call LSM hook after copying perf_event_attr
  perf: Fix use-after-free in error path
  perf/x86/amd: fix potential integer overflow on shift of a int
  perf/core: Fix cgroup events tracking
  perf core: Return error pointer if inherit_event() fails to find pmu_ctx

Merge tag 'x86_urgent_for_v6.2_rc2' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Two fixes to correct how kprobes handles INT3 now that they're added
   by other functionality like the rethunks and not only kgdb

- Remove __init section markings of two functions which are referenced
   by a function in the .text section

* tag 'x86_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK
  x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK
  x86/calldepth: Fix incorrect init section references

Merge tag 'locking_urgent_for_v6.2_rc2' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:

- Prevent the leaking of a debug timer in futex_waitv()

- A preempt-RT mutex locking fix, adding the proper acquire semantics

* tag 'locking_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error
rtmutex: Add acquire semantics for rtmutex lock acquisition slow path

Merge tag 'drm-fixes-2023-01-01' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
"I'm just back from the mountains, and Dave is out at the beach and
  should be back in a week again. Just i915 fixes and since Rodrigo
  bothered to make the pull last week I figured I should warm up gpg and
  forward this in a nice signed tag as a new years present!

   - i915 fixes for newer platforms

   - i915 locking rework to not give up in vm eviction fallback path too
     early"

* tag 'drm-fixes-2023-01-01' of git://anongit.freedesktop.org/drm/drm:
  drm/i915/dsi: fix MIPI_BKLT_EN_1 native GPIO index
  drm/i915/dsi: add support for ICL+ native MIPI GPIO sequence
  drm/i915/uc: Fix two issues with over-size firmware files
  drm/i915: improve the catch-all evict to handle lock contention
  drm/i915: Remove __maybe_unused from mtl_info
  drm/i915: fix TLB invalidation for Gen12.50 video and compute engines

Merge tag 'drm-intel-fixes-2022-12-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- fix TLB invalidation for DG2 and newer platforms. (Andrzej)
- Remove __maybe_unused from mtl_info (Lucas)
- improve the catch-all evict to handle lock contention (Matt Auld)
- Fix two issues with over-size (GuC/HuC) firmware files (John)
- Fix DSI resume issues on ICL+ (Jani)

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y662ijDHrZCjTFla@intel.com

Merge tag 'kbuild-fixes-v6.2' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Fix broken BuildID

- Add srcrpm-pkg to the help message

- Fix the option order for modpost built with musl libc

- Fix the build dependency of rpm-pkg for openSUSE

* tag 'kbuild-fixes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  fixdep: remove unneeded <stdarg.h> inclusion
  kbuild: sort single-targets alphabetically again
  kbuild: rpm-pkg: add libelf-devel as alternative for BuildRequires
  kbuild: Fix running modpost with musl libc
  kbuild: add a missing line for help message
  .gitignore: ignore *.rpm
  arch: fix broken BuildID for arm64 and riscv
  kconfig: Add static text for search information in help menu

Merge tag 'ata-6.2-rc2' of git://git./linux/kernel/git/dlemoal/libata

Pull ata fix from Damien Le Moal:
"A single fix to address an issue with wake from suspend with PCS
adapters, from Adam"

* tag 'ata-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: ahci: Fix PCS quirk application for suspend

Merge tag 'acpi-6.2-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These are new ACPI IRQ override quirks, low-power S0 idle (S0ix)
  support adjustments and ACPI backlight handling fixes, mostly for
  platforms using AMD chips.

  Specifics:

   - Add ACPI IRQ override quirks for Asus ExpertBook B2502, Lenovo
     14ALC7, and XMG Core 15 (Hans de Goede, Adrian Freund, Erik
     Schumacher).

   - Adjust ACPI video detection fallback path to prevent
     non-operational ACPI backlight devices from being created on
     systems where the native driver does not detect a suitable panel
     (Mario Limonciello).

   - Fix Apple GMUX backlight detection (Hans de Goede).

   - Add a low-power S0 idle (S0ix) handling quirk for HP Elitebook 865
     and stop using AMD-specific low-power S0 idle code path for systems
     with Rembrandt chips and newer (Mario Limonciello)"

* tag 'acpi-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: x86: s2idle: Stop using AMD specific codepath for Rembrandt+
  ACPI: x86: s2idle: Force AMD GUID/_REV 2 on HP Elitebook 865
  ACPI: video: Fix Apple GMUX backlight detection
  ACPI: resource: Add Asus ExpertBook B2502 to Asus quirks
  ACPI: resource: do IRQ override on Lenovo 14ALC7
  ACPI: resource: do IRQ override on XMG Core 15
  ACPI: video: Don't enable fallback path for creating ACPI backlight by default
  drm/amd/display: Report to ACPI video if no panels were found
  ACPI: video: Allow GPU drivers to report no panels

Merge tag 'sound-6.2-rc2' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Just a few small fixes:

   - A regression fix for HDMI audio on HD-audio AMD codecs

   - Fixes for LINE6 MIDI handling

   - HD-audio quirk for Dell laptops"

* tag 'sound-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/hdmi: Static PCM mapping again with AMD HDMI codecs
  ALSA: hda/realtek: Apply dual codec fixup for Dell Latitude laptops
  ALSA: line6: fix stack overflow in line6_midi_transmit
  ALSA: line6: correct midi status byte when receiving data from podxt

Merge branches 'acpi-resource' and 'acpi-video'

Merge ACPI resource handling quirks and ACPI backlight handling fixes
for 6.2-rc2:

- Add ACPI IRQ override quirks for Asus ExpertBook B2502, Lenovo
   14ALC7, and XMG Core 15 (Hans de Goede, Adrian Freund,  Erik
   Schumacher).

- Adjust ACPI video detection fallback path to prevent non-operational
   ACPI backlight devices from being created on systems where the native
   driver does not detect a suitable panel (Mario Limonciello).

- Fix Apple GMUX backlight detection (Hans de Goede).

* acpi-resource:
  ACPI: resource: Add Asus ExpertBook B2502 to Asus quirks
  ACPI: resource: do IRQ override on Lenovo 14ALC7
  ACPI: resource: do IRQ override on XMG Core 15

* acpi-video:
  ACPI: video: Fix Apple GMUX backlight detection
  ACPI: video: Don't enable fallback path for creating ACPI backlight by default
  drm/amd/display: Report to ACPI video if no panels were found
  ACPI: video: Allow GPU drivers to report no panels

drm/i915/dsi: fix MIPI_BKLT_EN_1 native GPIO index

Due to copy-paste fail, MIPI_BKLT_EN_1 would always use PPS index 1,
never 0. Fix the sloppiest commit in recent memory.

Fixes: 963bbdb32b47 ("drm/i915/dsi: add support for ICL+ native MIPI GPIO sequence")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221220140105.313333-1-jani.nikula@intel.com
(cherry picked from commit a561933c571798868b5fa42198427a7e6df56c09)
Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915/dsi: add support for ICL+ native MIPI GPIO sequence

Starting from ICL, the default for MIPI GPIO sequences seems to be using
native GPIOs i.e. GPIOs available in the GPU. These native GPIOs reuse
many pins that quite frankly seem scary to poke based on the VBT
sequences. We pretty much have to trust that the board is configured
such that the relevant HPD, PP_CONTROL and GPIO bits aren't used for
anything else.

MIPI sequence v4 also adds a flag to fall back to non-native sequences.

v5:
- Wrap SHOTPLUG_CTL_DDI modification in spin_lock() in icp_irq_handler()
too (Ville)
- References instead of Closes issue 6131 because this does not fix everything

v4:
- Wrap SHOTPLUG_CTL_DDI modification in spin_lock_irq() (Ville)

v3:
- Fix -Wbitwise-conditional-parentheses (kernel test robot <lkp@intel.com>)

v2:
- Fix HPD pin output set (impacts GPIOs 0 and 5)
- Fix GPIO data output direction set (impacts GPIOs 4 and 9)
- Reduce register accesses to single intel_de_rwm()

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6131
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221219105955.4014451-1-jani.nikula@intel.com
(cherry picked from commit f087cfe6fcff58044f7aa3b284965af47f472fb0)
Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

fixdep: remove unneeded <stdarg.h> inclusion

This is unneeded since commit 69304379ff03 ("fixdep: use fflush() and
ferror() to ensure successful write to files").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: sort single-targets alphabetically again

This was previously alphabetically sorted. Sort it again.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>

kbuild: rpm-pkg: add libelf-devel as alternative for BuildRequires

Guoqing Jiang reports that openSUSE cannot compile the kernel rpm due
to "BuildRequires: elfutils-libelf-devel" added by commit 8818039f959b
("kbuild: add ability to make source rpm buildable using koji").
The relevant package name in openSUSE is libelf-devel.

Add it as an alternative package.

BTW, if it is impossible to solve the build requirement, the final
resort would be:

    $ make RPMOPTS=--nodeps rpm-pkg

This passes --nodeps to the rpmbuild command so it will not verify
build dependencies. This is useful to test rpm builds on non-rpm
system. On Debian/Ubuntu, for example, you can install rpmbuild by
'apt-get install rpm'.

NOTE1:
  Likewise, it is possible to bypass the build dependency check for
  debian package builds:

    $ make DPKG_FLAGS=-d deb-pkg

NOTE2:
  The 'or' operator is supported since RPM 4.13. So, old distros such
  as CentOS 7 will break. I suggest installing newer rpmbuild in such
  cases.

Link: https://lore.kernel.org/linux-kbuild/ee227d24-9c94-bfa3-166a-4ee6b5dfea09@linux.dev/T/#u
Fixes: 8818039f959b ("kbuild: add ability to make source rpm buildable using koji")
Reported-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>

kbuild: Fix running modpost with musl libc

commit 3d57e1b7b1d4 ("kbuild: refactor the prerequisites of the modpost
rule") moved 'vmlinux.o' inside modpost-args, possibly before some of
the other options. However, getopt() in musl libc follows POSIX and
stops looking for options upon reaching the first non-option argument.
As a result, the '-T' option is misinterpreted as a positional argument,
and the build fails:

  make -f ./scripts/Makefile.modpost
     scripts/mod/modpost   -E   -o Module.symvers vmlinux.o -T modules.order
  -T: No such file or directory
  make[1]: *** [scripts/Makefile.modpost:137: Module.symvers] Error 1
  make: *** [Makefile:1960: modpost] Error 2

The fix is to move all options before 'vmlinux.o' in modpost-args.

Fixes: 3d57e1b7b1d4 ("kbuild: refactor the prerequisites of the modpost rule")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: add a missing line for help message

The help message line for building the source RPM package was missing.
Added it.

Signed-off-by: Jun ASAKA <JunASAKA@zzy040330.moe>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

.gitignore: ignore *.rpm

Previously, *.rpm files were created under $HOME/rpmbuild/, but since
commit 8818039f959b ("kbuild: add ability to make source rpm buildable
using koji"), srcrpm-pkg creates the source rpm in the kernel tree
because it sets '_srcrpmdir'.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

arch: fix broken BuildID for arm64 and riscv

Dennis Gilmore reports that the BuildID is missing in the arm64 vmlinux
since commit 994b7ac1697b ("arm64: remove special treatment for the
link order of head.o").

The issue is that the type of .notes section, which contains the BuildID,
changed from NOTES to PROGBITS.

Ard Biesheuvel figured out that whichever object gets linked first gets
to decide the type of a section. The PROGBITS type is the result of the
compiler emitting .note.GNU-stack as PROGBITS rather than NOTE.

While Ard provided a fix for arm64, I want to fix this globally because
the same issue is happening on riscv since commit 2348e6bf4421 ("riscv:
remove special treatment for the link order of head.o"). This problem
will happen in general for other architectures if they start to drop
unneeded entries from scripts/head-object-list.txt.

Discard .note.GNU-stack in include/asm-generic/vmlinux.lds.h.

Link: https://lore.kernel.org/lkml/CAABkxwuQoz1CTbyb57n0ZX65eSYiTonFCU8-LCQc=74D=xE=rA@mail.gmail.com/
Fixes: 994b7ac1697b ("arm64: remove special treatment for the link order of head.o")
Fixes: 2348e6bf4421 ("riscv: remove special treatment for the link order of head.o")
Reported-by: Dennis Gilmore <dennis@ausil.us>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

drm/i915/uc: Fix two issues with over-size firmware files

In the case where a firmware file is too large (e.g. someone
downloaded a web page ASCII dump from github...), the firmware object
is released but the pointer is not zerod. If no other firmware file
was found then release would be called again leading to a double kfree.

Also, the size check was only being applied to the initial firmware
load not any of the subsequent attempts. So move the check into a
wrapper that is used for all loads.

Fixes: 016241168dc5 ("drm/i915/uc: use different ggtt pin offsets for uc loads")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221221193031.687266-4-John.C.Harrison@Intel.com
(cherry picked from commit 4071d98b296a5bc5fd4b15ec651bd05800ec9510)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: improve the catch-all evict to handle lock contention

The catch-all evict can fail due to object lock contention, since it
only goes as far as trylocking the object, due to us already holding the
vm->mutex. Doing a full object lock here can deadlock, since the
vm->mutex is always our inner lock. Add another execbuf pass which drops
the vm->mutex and then tries to grab the object will the full lock,
before then retrying the eviction. This should be good enough for now to
fix the immediate regression with userspace seeing -ENOSPC from execbuf
due to contended object locks during GTT eviction.

v2 (Mani)
- Also revamp the docs for the different passes.

Testcase: igt@gem_ppgtt@shrink-vs-evict-*
Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570
References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Mani Milani <mani@chromium.org>
Cc: <stable@vger.kernel.org> # v5.18+
Reviewed-by: Mani Milani <mani@chromium.org>
Tested-by: Mani Milani <mani@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221216113456.414183-1-matthew.auld@intel.com
(cherry picked from commit 801fa7a81f6da533cc5442fc40e32c72b76cd42a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: Remove __maybe_unused from mtl_info

The attribute __maybe_unused should remain only until the respective
info is not in the pciidlist. The info can't be added together
with its definition because that would cause the driver to automatically
probe for the device, while it's still not ready for that. However once
pciidlist contains it, the attribute can be removed.

Fixes: 7835303982d1 ("drm/i915/mtl: Add MeteorLake PCI IDs")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221214194944.3670344-1-lucas.demarchi@intel.com
(cherry picked from commit 50490ce05b7a50b0bd4108fa7d6db3ca2972fa83)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: fix TLB invalidation for Gen12.50 video and compute engines

In case of Gen12.50 video and compute engines, TLB_INV registers are
masked - to modify one bit, corresponding bit in upper half of the register
must be enabled, otherwise nothing happens.

Fixes: 77fa9efc16a9 ("drm/i915/xehp: Create separate reg definitions for new MCR registers")
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221214075439.402485-1-andrzej.hajda@intel.com
(cherry picked from commit 4d5cf7b1680a1e6db327e3c935ef58325cbedb2c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
"Mostly just NVMe, but also a single fixup for BFQ for a regression
  that happened during the merge window. In detail:

   - NVMe pull requests via Christoph:
      - Fix doorbell buffer value endianness (Klaus Jensen)
      - Fix Linux vs NVMe page size mismatch (Keith Busch)
      - Fix a potential use memory access beyong the allocation limit
        (Keith Busch)
      - Fix a multipath vs blktrace NULL pointer dereference (Yanjun
        Zhang)
      - Fix various problems in handling the Command Supported and
        Effects log (Christoph Hellwig)
      - Don't allow unprivileged passthrough of commands that don't
        transfer data but modify logical block content (Christoph
        Hellwig)
      - Add a features and quirks policy document (Christoph Hellwig)
      - Fix some really nasty code that was correct but made smatch
        complain (Sagi Grimberg)

   - Use-after-free regression in BFQ from this merge window (Yu)"

* tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux:
  nvme-auth: fix smatch warning complaints
  nvme: consult the CSE log page for unprivileged passthrough
  nvme: also return I/O command effects from nvme_command_effects
  nvmet: don't defer passthrough commands with trivial effects to the workqueue
  nvmet: set the LBCC bit for commands that modify data
  nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it
  nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
  docs, nvme: add a feature and quirk policy document
  nvme-pci: update sqsize when adjusting the queue depth
  nvme: fix setting the queue depth in nvme_alloc_io_tag_set
  block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq
  nvme: fix multipath crash caused by flush request when blktrace is enabled
  nvme-pci: fix page size checks
  nvme-pci: fix mempool alloc size
  nvme-pci: fix doorbell buffer value endianness

Merge tag 'io_uring-6.2-2022-12-29' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Two fixes for mutex grabbing when the task state is != TASK_RUNNING
   (me)

- Check for invalid opcode in io_uring_register() a bit earlier, to
   avoid going through the quiesce machinery just to return -EINVAL
   later in the process (me)

- Fix for the uapi io_uring header, skipping including time_types.h
   when necessary (Stefan)

* tag 'io_uring-6.2-2022-12-29' of git://git.kernel.dk/linux:
  uapi:io_uring.h: allow linux/time_types.h to be skipped
  io_uring: check for valid register opcode earlier
  io_uring/cancel: re-grab ctx mutex after finishing wait
  io_uring: finish waiting before flushing overflow entries

Merge tag 'linux-kselftest-kunit-fixes-6.2-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull KUnit fix from Shuah Khan:

- alloc_string_stream_fragment() error path fix to free before
returning a failure.

* tag 'linux-kselftest-kunit-fixes-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: alloc_string_stream_fragment error handling bug fix

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Changes that were posted too late for 6.1, or after the release.

  x86:

   - several fixes to nested VMX execution controls

   - fixes and clarification to the documentation for Xen emulation

   - do not unnecessarily release a pmu event with zero period

   - MMU fixes

   - fix Coverity warning in kvm_hv_flush_tlb()

  selftests:

   - fixes for the ucall mechanism in selftests

   - other fixes mostly related to compilation with clang"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (41 commits)
  KVM: selftests: restore special vmmcall code layout needed by the harness
  Documentation: kvm: clarify SRCU locking order
  KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET
  KVM: x86/xen: Documentation updates and clarifications
  KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapi
  KVM: x86/xen: Simplify eventfd IOCTLs
  KVM: x86/xen: Fix SRCU/RCU usage in readers of evtchn_ports
  KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badly
  KVM: x86/xen: Fix memory leak in kvm_xen_write_hypercall_page()
  KVM: Delete extra block of "};" in the KVM API documentation
  kvm: x86/mmu: Remove duplicated "be split" in spte.h
  kvm: Remove the unused macro KVM_MMU_READ_{,UN}LOCK()
  MAINTAINERS: adjust entry after renaming the vmx hyperv files
  KVM: selftests: Mark correct page as mapped in virt_map()
  KVM: arm64: selftests: Don't identity map the ucall MMIO hole
  KVM: selftests: document the default implementation of vm_vaddr_populate_bitmap
  KVM: selftests: Use magic value to signal ucall_alloc() failure
  KVM: selftests: Disable "gnu-variable-sized-type-not-at-end" warning
  KVM: selftests: Include lib.mk before consuming $(CC)
  KVM: selftests: Explicitly disable builtins for mem*() overrides
  ...

Merge tag 'nvme-6.2-2022-12-29' of git://git.infradead.org/nvme into block-6.2

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.2

- fix various problems in handling the Command Supported and Effects log
   (Christoph Hellwig)
- don't allow unprivileged passthrough of commands that don't transfer
   data but modify logical block content (Christoph Hellwig)
- add a features and quirks policy document (Christoph Hellwig)
- fix some really nasty code that was correct but made smatch complain
   (Sagi Grimberg)"

* tag 'nvme-6.2-2022-12-29' of git://git.infradead.org/nvme:
  nvme-auth: fix smatch warning complaints
  nvme: consult the CSE log page for unprivileged passthrough
  nvme: also return I/O command effects from nvme_command_effects
  nvmet: don't defer passthrough commands with trivial effects to the workqueue
  nvmet: set the LBCC bit for commands that modify data
  nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it
  nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
  docs, nvme: add a feature and quirk policy document

kconfig: Add static text for search information in help menu

Add few static text to explain how one can bring up the search dialog
box by pressing the forward slash key anywhere on this interface.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

nvme-auth: fix smatch warning complaints

When initializing auth context, there may be no secrets passed
by the user. Make return code explicit when returning successfully.

smatch warnings:
drivers/nvme/host/auth.c:950 nvme_auth_init_ctrl() warn: missing error code? 'ret'

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

nvme: consult the CSE log page for unprivileged passthrough

Commands like Write Zeros can change the contents of a namespaces without
actually transferring data. To protect against this, check the Commands
Supported and Effects log is supported by the controller for any
unprivileg command passthrough and refuse unprivileged passthrough if the
command has any effects that can change data or metadata.

Note: While the Commands Support and Effects log page has only been
mandatory since NVMe 2.0, it is widely supported because Windows requires
it for any command passthrough from userspace.

Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>

nvme: also return I/O command effects from nvme_command_effects

To be able to use the Commands Supported and Effects Log for allowing
unprivileged passtrough, it needs to be corretly reported for I/O
commands as well. Return the I/O command effects from
nvme_command_effects, and also add a default list of effects for the
NVM command set. For other command sets, the Commands Supported and
Effects log is required to be present already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>

nvmet: don't defer passthrough commands with trivial effects to the workqueue

Mask out the "Command Supported" and "Logical Block Content Change" bits
and only defer execution of commands that have non-trivial effects to
the workqueue for synchronous execution. This allows to execute admin
commands asynchronously on controllers that provide a Command Supported
and Effects log page, and will keep allowing to execute Write commands
asynchronously once command effects on I/O commands are taken into
account.

Fixes: c1fef73f793b ("nvmet: add passthru code to process commands")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>

nvmet: set the LBCC bit for commands that modify data

Write, Write Zeroes, Zone append and a Zone Reset through
Zone Management Send modify the logical block content of a namespace,
so make sure the LBCC bit is reported for them.

Fixes: b5d0b38c0475 ("nvmet: add Command Set Identifier support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it

Use NVME_CMD_EFFECTS_CSUPP instead of open coding it and assign a
single value to multiple array entries instead of repeated assignments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition

3 << 16 does not generate the correct mask for bits 16, 17 and 18.
Use the GENMASK macro to generate the correct mask instead.

Fixes: 84fef62d135b ("nvme: check admin passthru command effects")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>

docs, nvme: add a feature and quirk policy document

This adds a document about what specification features are supported by
the Linux NVMe driver, and what qualifies for a quirk if an implementation
has problems following the specification.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>

ALSA: hda/hdmi: Static PCM mapping again with AMD HDMI codecs

The recent code refactoring for HD-audio HDMI codec driver caused a
regression on AMD/ATI HDMI codecs; namely, PulseAudioand pipewire
don't recognize HDMI outputs any longer while the direct output via
ALSA raw access still works.

The problem turned out that, after the code refactoring, the driver
assumes only the dynamic PCM assignment, and when a PCM stream that
still isn't assigned to any pin gets opened, the driver tries to
assign any free converter to the PCM stream.  This behavior is OK for
Intel and other codecs, as they have arbitrary connections between
pins and converters.  OTOH, on AMD chips that have a 1:1 mapping
between pins and converters, this may end up with blocking the open of
the next PCM stream for the pin that is tied with the formerly taken
converter.

Also, with the code refactoring, more PCM streams are exposed than
necessary as we assume all converters can be used, while this isn't
true for AMD case.  This may change the PCM stream assignment and
confuse users as well.

This patch fixes those problems by:

- Introducing a flag spec->static_pcm_mapping, and if it's set, the
  driver applies the static mapping between pins and converters at the
  probe time
- Limiting the number of PCM streams per pins, too; this avoids the
  superfluous PCM streams

Fixes: ef6f5494faf6 ("ALSA: hda/hdmi: Use only dynamic PCM device allocation")
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216836
Co-developed-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20221228125714.16329-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'kvm-late-6.1-fixes' into HEAD

x86:

* several fixes to nested VMX execution controls

* fixes and clarification to the documentation for Xen emulation

* do not unnecessarily release a pmu event with zero period

* MMU fixes

* fix Coverity warning in kvm_hv_flush_tlb()

selftests:

* fixes for the ucall mechanism in selftests

* other fixes mostly related to compilation with clang

KVM: selftests: restore special vmmcall code layout needed by the harness

Commit 8fda37cf3d41 ("KVM: selftests: Stuff RAX/RCX with 'safe' values
in vmmcall()/vmcall()", 2022-11-21) broke the svm_nested_soft_inject_test
because it placed a "pop rbp" instruction after vmmcall. While this is
correct and mimics what is done in the VMX case, this particular test
expects a ud2 instruction right after the vmmcall, so that it can skip
over it in the L1 part of the test.

Inline a suitably-modified version of vmmcall() to restore the
functionality of the test.

Fixes: 8fda37cf3d41 ("KVM: selftests: Stuff RAX/RCX with 'safe' values in vmmcall()/vmcall()"
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221130181147.9911-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Documentation: kvm: clarify SRCU locking order

Currently only the locking order of SRCU vs kvm->slots_arch_lock
and kvm->slots_lock is documented. Extend this to kvm->lock
since Xen emulation got it terribly wrong.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET

While KVM_XEN_EVTCHN_RESET is usually called with no vCPUs running,
if that happened it could cause a deadlock. This is due to
kvm_xen_eventfd_reset() doing a synchronize_srcu() inside
a kvm->lock critical section.

To avoid this, first collect all the evtchnfd objects in an
array and free all of them once the kvm->lock critical section
is over and th SRCU grace period has expired.

Reported-by: Michal Luczaj <mhal@rbox.co>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

uapi:io_uring.h: allow linux/time_types.h to be skipped

include/uapi/linux/io_uring.h is synced 1:1 into
liburing:src/include/liburing/io_uring.h.

liburing has a configure check to detect the need for
linux/time_types.h. It can opt-out by defining
UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H

Fixes: 78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()")
Link: https://github.com/axboe/liburing/issues/708
Link: https://github.com/axboe/liburing/pull/709
Link: https://lore.kernel.org/io-uring/20221115212614.1308132-1-ammar.faizi@intel.com/T/#m9f5dd571cd4f6a5dee84452dbbca3b92ba7a4091
CC: Jens Axboe <axboe@kernel.dk>
Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Link: https://lore.kernel.org/r/7071a0a1d751221538b20b63f9160094fc7e06f4.1668630247.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error

In a scenario where kcalloc() fails to allocate memory, the futex_waitv
system call immediately returns -ENOMEM without invoking
destroy_hrtimer_on_stack(). When CONFIG_DEBUG_OBJECTS_TIMERS=y, this
results in leaking a timer debug object.

Fixes: bf69bad38cf6 ("futex: Implement sys_futex_waitv()")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org # v5.16+
Link: https://lore.kernel.org/r/20221214222008.200393-1-mathieu.desnoyers@efficios.com

x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK

Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
speculative execution after function return, kprobe jump optimization
always fails on the functions with such INT3 inside the function body.
(It already checks the INT3 padding between functions, but not inside
the function)

To avoid this issue, as same as kprobes, check whether the INT3 comes
from kgdb or not, and if so, stop decoding and make it fail. The other
INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be
treated as a one-byte instruction.

Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/167146051929.1374301.7419382929328081706.stgit@devnote3

x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK

Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
speculative execution after RET instruction, kprobes always failes to
check the probed instruction boundary by decoding the function body if
the probed address is after such sequence. (Note that some conditional
code blocks will be placed after function return, if compiler decides
it is not on the hot path.)

This is because kprobes expects kgdb puts the INT3 as a software
breakpoint and it will replace the original instruction.
But these INT3 are not such purpose, it doesn't need to recover the
original instruction.

To avoid this issue, kprobes checks whether the INT3 is owned by
kgdb or not, and if so, stop decoding and make it fail. The other
INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be
treated as a one-byte instruction.

Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/167146051026.1374301.392728975473572291.stgit@devnote3

x86/calldepth: Fix incorrect init section references

The addition of callthunks_translate_call_dest means that
skip_addr() and patch_dest() can no longer be discarded
as part of the __init section freeing:

WARNING: modpost: vmlinux.o: section mismatch in reference: callthunks_translate_call_dest.cold (section: .text.unlikely) -> skip_addr (section: .init.text)
WARNING: modpost: vmlinux.o: section mismatch in reference: callthunks_translate_call_dest.cold (section: .text.unlikely) -> patch_dest (section: .init.text)
WARNING: modpost: vmlinux.o: section mismatch in reference: is_callthunk.cold (section: .text.unlikely) -> skip_addr (section: .init.text)
ERROR: modpost: Section mismatches detected.
Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.

Fixes: b2e9dfe54be4 ("x86/bpf: Emit call depth accounting if required")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221215164334.968863-1-arnd@kernel.org

perf/core: Call LSM hook after copying perf_event_attr

It passes the attr struct to the security_perf_event_open() but it's
not initialized yet.

Fixes: da97e18458fb ("perf_event: Add support for LSM and SELinux checks")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221220223140.4020470-1-namhyung@kernel.org

perf: Fix use-after-free in error path

The syscall error path has a use-after-free; put_pmu_ctx() will
reference ctx, therefore we must ensure ctx is destroyed after pmu_ctx
is.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: syzbot+b8e8c01c8ade4fe6e48f@syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lkml.kernel.org/r/Y6B3xEgkbmFUCeni@hirez.programming.kicks-ass.net

perf/x86/amd: fix potential integer overflow on shift of a int

The left shift of int 32 bit integer constant 1 is evaluated using 32 bit
arithmetic and then passed as a 64 bit function argument. In the case where
i is 32 or more this can lead to an overflow. Avoid this by shifting
using the BIT_ULL macro instead.

Fixes: 471af006a747 ("perf/x86/amd: Constrain Large Increment per Cycle events")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Kim Phillips <kim.phillips@amd.com>
Link: https://lore.kernel.org/r/20221202135149.1797974-1-colin.i.king@gmail.com

perf/core: Fix cgroup events tracking

We encounter perf warnings when using cgroup events like:

  cd /sys/fs/cgroup
  mkdir test
  perf stat -e cycles -a -G test

Which then triggers:

  WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0
  Call Trace:
   <TASK>
   __schedule+0x4ae/0x9f0
   ? _raw_spin_unlock_irqrestore+0x23/0x40
   ? __cond_resched+0x18/0x20
   preempt_schedule_common+0x2d/0x70
   __cond_resched+0x18/0x20
   wait_for_completion+0x2f/0x160
   ? cpu_stop_queue_work+0x9e/0x130
   affine_move_task+0x18a/0x4f0

  WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0
  Call Trace:
   <TASK>
   ? ctx_sched_out+0xb7/0x1b0
   perf_cgroup_switch+0x88/0xc0
   __schedule+0x4ae/0x9f0
   ? _raw_spin_unlock_irqrestore+0x23/0x40
   ? __cond_resched+0x18/0x20
   preempt_schedule_common+0x2d/0x70
   __cond_resched+0x18/0x20
   wait_for_completion+0x2f/0x160
   ? cpu_stop_queue_work+0x9e/0x130
   affine_move_task+0x18a/0x4f0

The above two warnings are not complete here since I remove other
unimportant information. The problem is caused by the perf cgroup
events tracking:

  CPU0 CPU1
  perf_event_open()
    perf_event_alloc()
      account_event()
account_event_cpu()
  atomic_inc(perf_cgroup_events)
  __perf_event_task_sched_out()
    if (atomic_read(perf_cgroup_events))
      perf_cgroup_switch()
// kernel/events/core.c:849
WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0)
if (READ_ONCE(cpuctx->cgrp) == cgrp) // false
  return
perf_ctx_lock()
ctx_sched_out()
cpuctx->cgrp = cgrp
ctx_sched_in()
  perf_cgroup_set_timestamp()
    // kernel/events/core.c:829
    WARN_ON_ONCE(!ctx->nr_cgroups)
perf_ctx_unlock()
    perf_install_in_context()
      cpu_function_call()
  __perf_install_in_context()
    add_event_to_ctx()
      list_add_event()
perf_cgroup_event_enable()
  ctx->nr_cgroups++
  cpuctx->cgrp = X

We can see from above that we wrongly use percpu atomic perf_cgroup_events
to check if we need to perf_cgroup_switch(), which should only be used
when we know this CPU has cgroup events enabled.

The commit bd2756811766 ("perf: Rewrite core context handling") change
to have only one context per-CPU, so we can just use cpuctx->cgrp to
check if this CPU has cgroup events enabled.

So percpu atomic perf_cgroup_events is not needed.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lkml.kernel.org/r/20221207124023.66252-1-zhouchengming@bytedance.com

perf core: Return error pointer if inherit_event() fails to find pmu_ctx

inherit_event() returns NULL only when it finds orphaned events
otherwise it returns either valid child_event pointer or an error
pointer. Follow the same when it fails to find pmu_ctx.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221118051539.820-1-ravi.bangoria@amd.com

KVM: x86/xen: Documentation updates and clarifications

Most notably, the KVM_XEN_EVTCHN_RESET feature had escaped documentation
entirely. Along with how to turn most stuff off on SHUTDOWN_soft_reset.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-6-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapi

These are (uint64_t)-1 magic values are a userspace ABI, allowing the
shared info pages and other enlightenments to be disabled. This isn't
a Xen ABI because Xen doesn't let the guest turn these off except with
the full SHUTDOWN_soft_reset mechanism. Under KVM, the userspace VMM is
expected to handle soft reset, and tear down the kernel parts of the
enlightenments accordingly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-5-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/xen: Simplify eventfd IOCTLs

Port number is validated in kvm_xen_setattr_evtchn().
Remove superfluous checks in kvm_xen_eventfd_assign() and
kvm_xen_eventfd_update().

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <20221222203021.1944101-3-mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-4-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/xen: Fix SRCU/RCU usage in readers of evtchn_ports

The evtchnfd structure itself must be protected by either kvm->lock or
SRCU. Use the former in kvm_xen_eventfd_update(), since the lock is
being taken anyway; kvm_xen_hcall_evtchn_send() instead is a reader and
does not need kvm->lock, and is called in SRCU critical section from the
kvm_x86_handle_exit function.

It is also important to use rcu_read_{lock,unlock}() in
kvm_xen_hcall_evtchn_send(), because idr_remove() will *not*
use synchronize_srcu() to wait for readers to complete.

Remove a superfluous if (kvm) check before calling synchronize_srcu()
in kvm_xen_eventfd_deassign() where kvm has been dereferenced already.

Co-developed-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badly

In particular, we shouldn't assume that being contiguous in guest virtual
address space means being contiguous in guest *physical* address space.

In dropping the manual calls to kvm_mmu_gva_to_gpa_system(), also drop
the srcu_read_lock() that was around them. All call sites are reached
from kvm_xen_hypercall() which is called from the handle_exit function
with the read lock already held.

536395260 ("KVM: x86/xen: handle PV timers oneshot mode")
1a65105a5 ("KVM: x86/xen: handle PV spinlocks slowpath")

Fixes: 2fd6df2f2 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/xen: Fix memory leak in kvm_xen_write_hypercall_page()

Release page irrespectively of kvm_vcpu_write_guest() return value.

Suggested-by: Paul Durrant <paul@xen.org>
Fixes: 23200b7a30de ("KVM: x86/xen: intercept xen hypercalls if enabled")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <20221220151454.712165-1-mhal@rbox.co>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-1-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Delete extra block of "};" in the KVM API documentation

Delete an extra block of code/documentation that snuck in when KVM's
documentation was converted to ReST format.

Fixes: 106ee47dc633 ("docs: kvm: Convert api.txt to ReST format")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221207003637.2041211-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

kvm: x86/mmu: Remove duplicated "be split" in spte.h

"be split be split" -> "be split"

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20221207120505.9175-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

kvm: Remove the unused macro KVM_MMU_READ_{,UN}LOCK()

No code is using KVM_MMU_READ_LOCK() or KVM_MMU_READ_UNLOCK().  They
used to be in virt/kvm/pfncache.c:

                KVM_MMU_READ_LOCK(kvm);
                retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
                KVM_MMU_READ_UNLOCK(kvm);

However, since 58cd407ca4c6 ("KVM: Fix multiple races in gfn=>pfn cache
refresh", 2022-05-25) the code is only relying on the MMU notifier's
invalidation count and sequence number.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20221207120617.9409-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

MAINTAINERS: adjust entry after renaming the vmx hyperv files

Commit a789aeba4196 ("KVM: VMX: Rename "vmx/evmcs.{ch}" to
"vmx/hyperv.{ch}"") renames the VMX specific Hyper-V files, but does not
adjust the entry in MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.

Repair this file reference in KVM X86 HYPER-V (KVM/hyper-v).

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Fixes: a789aeba4196 ("KVM: VMX: Rename "vmx/evmcs.{ch}" to "vmx/hyperv.{ch}"")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221205082044.10141-1-lukas.bulwahn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Mark correct page as mapped in virt_map()

The loop marks vaddr as mapped after incrementing it by page size,
thereby marking the *next* page as mapped. Set the bit in vpages_mapped
first instead.

Fixes: 56fc7732031d ("KVM: selftests: Fill in vm->vpages_mapped bitmap in virt_map() too")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221209015307.1781352-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: arm64: selftests: Don't identity map the ucall MMIO hole

Currently the ucall MMIO hole is placed immediately after slot0, which
is a relatively safe address in the PA space. However, it is possible
that the same address has already been used for something else (like the
guest program image) in the VA space. At least in my own testing,
building the vgic_irq test with clang leads to the MMIO hole appearing
underneath gicv3_ops.

Stop identity mapping the MMIO hole and instead find an unused VA to map
to it. Yet another subtle detail of the KVM selftests library is that
virt_pg_map() does not update vm->vpages_mapped. Switch over to
virt_map() instead to guarantee that the chosen VA isn't to something
else.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221209015307.1781352-6-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: document the default implementation of vm_vaddr_populate_bitmap

Explain the meaning of the bit manipulations of vm_vaddr_populate_bitmap.
These correspond to the "canonical addresses" of x86 and other
architectures, but that is not obvious.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Use magic value to signal ucall_alloc() failure

Use a magic value to signal a ucall_alloc() failure instead of simply
doing GUEST_ASSERT(). GUEST_ASSERT() relies on ucall_alloc() and so a
failure puts the guest into an infinite loop.

Use -1 as the magic value, as a real ucall struct should never wrap.

Reported-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Disable "gnu-variable-sized-type-not-at-end" warning

Disable gnu-variable-sized-type-not-at-end so that tests and libraries
can create overlays of variable sized arrays at the end of structs when
using a fixed number of entries, e.g. to get/set a single MSR.

It's possible to fudge around the warning, e.g. by defining a custom
struct that hardcodes the number of entries, but that is a burden for
both developers and readers of the code.

lib/x86_64/processor.c:664:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
                struct kvm_msrs header;
                                ^
lib/x86_64/processor.c:772:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
                struct kvm_msrs header;
                                ^
lib/x86_64/processor.c:787:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
                struct kvm_msrs header;
                                ^
3 warnings generated.

x86_64/hyperv_tlb_flush.c:54:18: warning: field 'hv_vp_set' with variable sized type 'struct hv_vpset'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
        struct hv_vpset hv_vp_set;
                        ^
1 warning generated.

x86_64/xen_shinfo_test.c:137:25: warning: field 'info' with variable sized type 'struct kvm_irq_routing'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
        struct kvm_irq_routing info;
                               ^
1 warning generated.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Include lib.mk before consuming $(CC)

Include lib.mk before consuming $(CC) and document that lib.mk overwrites
$(CC) unless make was invoked with -e or $(CC) was specified after make
(which makes the environment override the Makefile). Including lib.mk
after using it for probing, e.g. for -no-pie, can lead to weirdness.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Explicitly disable builtins for mem*() overrides

Explicitly disable the compiler's builtin memcmp(), memcpy(), and
memset().  Because only lib/string_override.c is built with -ffreestanding,
the compiler reserves the right to do what it wants and can try to link the
non-freestanding code to its own crud.

  /usr/bin/x86_64-linux-gnu-ld: /lib/x86_64-linux-gnu/libc.a(memcmp.o): in function `memcmp_ifunc':
  (.text+0x0): multiple definition of `memcmp'; tools/testing/selftests/kvm/lib/string_override.o:
  tools/testing/selftests/kvm/lib/string_override.c:15: first defined here
  clang: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: 6b6f71484bf4 ("KVM: selftests: Implement memcmp(), memcpy(), and memset() for guest use")
Reported-by: Aaron Lewis <aaronlewis@google.com>
Reported-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Probe -no-pie with actual CFLAGS used to compile

Probe -no-pie with the actual set of CFLAGS used to compile the tests,
clang whines about -no-pie being unused if the tests are compiled with
-static.

clang: warning: argument unused during compilation: '-no-pie'
[-Wunused-command-line-argument]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Use proper function prototypes in probing code

Make the main() functions in the probing code proper prototypes so that
compiling the probing code with more strict flags won't generate false
negatives.

<stdin>:1:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-8-seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Rename UNAME_M to ARCH_DIR, fill explicitly for x86

Rename UNAME_M to ARCH_DIR and explicitly set it directly for x86. At
this point, the name of the arch directory really doesn't have anything
to do with `uname -m`, and UNAME_M is unnecessarily confusing given that
its purpose is purely to identify the arch specific directory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Fix a typo in x86-64's kvm_get_cpu_address_width()

Fix a == vs. = typo in kvm_get_cpu_address_width() that results in
@pa_bits being left unset if the CPU doesn't support enumerating its
MAX_PHY_ADDR. Flagged by clang's unusued-value warning.

lib/x86_64/processor.c:1034:51: warning: expression result unused [-Wunused-value]
*pa_bits == kvm_cpu_has(X86_FEATURE_PAE) ? 36 : 32;

Fixes: 3bd396353d18 ("KVM: selftests: Add X86_FEATURE_PAE and use it calc "fallback" MAXPHYADDR")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213001653.3852042-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Use pattern matching in .gitignore

Use pattern matching to exclude everything except .c, .h, .S, and .sh
files from Git. Manually adding every test target has an absurd
maintenance cost, is comically error prone, and leads to bikeshedding
over whether or not the targets should be listed in alphabetical order.

Deliberately do not include the one-off assets, e.g. config, settings,
.gitignore itself, etc as Git doesn't ignore files that are already in
the repository. Adding the one-off assets won't prevent mistakes where
developers forget to --force add files that don't match the "allowed".

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Fix divide-by-zero bug in memslot_perf_test

Check that the number of pages per slot is non-zero in get_max_slots()
prior to computing the remaining number of pages.  clang generates code
that uses an actual DIV for calculating the remaining, which causes a #DE
if the total number of pages is less than the number of slots.

  traps: memslot_perf_te[97611] trap divide error ip:4030c4 sp:7ffd18ae58f0
         error:0 in memslot_perf_test[401000+cb000]

Fixes: a69170c65acd ("KVM: selftests: memslot_perf_test: Report optimal memory slots")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213001653.3852042-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Delete dead code in x86_64/vmx_tsc_adjust_test.c

Delete an unused struct definition in x86_64/vmx_tsc_adjust_test.c.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Define literal to asm constraint in aarch64 as unsigned long

Define a literal '0' asm input constraint to aarch64/page_fault_test's
guest_cas() as an unsigned long to make clang happy.

  tools/testing/selftests/kvm/aarch64/page_fault_test.c:120:16: error:
    value size does not match register size specified by the constraint
    and modifier [-Werror,-Wasm-operand-widths]
                       :: "r" (0), "r" (TEST_DATA), "r" (guest_test_memory));
                               ^
  tools/testing/selftests/kvm/aarch64/page_fault_test.c:119:15: note:
    use constraint modifier "w"
                       "casal %0, %1, [%2]\n"
                              ^~
                              %w0

Fixes: 35c581015712 ("KVM: selftests: aarch64: Add aarch64/page_fault_test")
Cc: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213001653.3852042-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ata: ahci: Fix PCS quirk application for suspend

Since kernel 5.3.4 my laptop (ICH8M controller) does not see Kingston
SV300S37A60G SSD disk connected into a SATA connector on wake from
suspend. The problem was introduced in c312ef176399 ("libata/ahci: Drop
PCS quirk for Denverton and beyond"): the quirk is not applied on wake
from suspend as it originally was.

It is worth to mention the commit contained another bug: the quirk is
not applied at all to controllers which require it. The fix commit
09d6ac8dc51a ("libata/ahci: Fix PCS quirk application") landed in 5.3.8.
So testing my patch anywhere between commits c312ef176399 and
09d6ac8dc51a is pointless.

Not all disks trigger the problem. For example nothing bad happens with
Western Digital WD5000LPCX HDD.

Test hardware:
- Acer 5920G with ICH8M SATA controller
- sda: some SATA HDD connnected into the DVD drive IDE port with a
SATA-IDE caddy. It is a boot disk
- sdb: Kingston SV300S37A60G SSD connected into the only SATA port

Sample "dmesg --notime | grep -E '^(sd |ata)'" output on wake:

sd 0:0:0:0: [sda] Starting disk
sd 2:0:0:0: [sdb] Starting disk
ata4: SATA link down (SStatus 4 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
ata1.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out
ata1: FORCE: cable set to 80c
ata5: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata3.00: disabled
sd 2:0:0:0: rejecting I/O to offline device
ata3.00: detaching (SCSI 2:0:0:0)
sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
sd 2:0:0:0: [sdb] Synchronizing SCSI cache
sd 2:0:0:0: [sdb] Synchronize Cache(10) failed: Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 2:0:0:0: [sdb] Stopping disk
sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK

Commit c312ef176399 dropped ahci_pci_reset_controller() which internally
calls ahci_reset_controller() and applies the PCS quirk if needed after
that. It was called each time a reset was required instead of just
ahci_reset_controller(). This patch puts the function back in place.

Fixes: c312ef176399 ("libata/ahci: Drop PCS quirk for Denverton and beyond")
Signed-off-by: Adam Vodopjan <grozzly@protonmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

kunit: alloc_string_stream_fragment error handling bug fix

When it fails to allocate fragment, it does not free and return error.
And check the pointer inappropriately.

Fixed merge conflicts with
commit 618887768bb7 ("kunit: update NULL vs IS_ERR() tests")
Shuah Khan <skhan@linuxfoundation.org>

Signed-off-by: YoungJun.park <her0gyugyu@gmail.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

nvme-pci: update sqsize when adjusting the queue depth

Update the core sqsize field in addition to the PCIe-specific
q_depth field as the core tagset allocation helpers rely on it.

Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hugh Dickins <hughd@google.com>
Link: https://lore.kernel.org/r/20221225103234.226794-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme: fix setting the queue depth in nvme_alloc_io_tag_set

While the CAP.MQES field in NVMe is a 0s based filed with a natural one
off, we also need to account for the queue wrap condition and fix undo
the one off again in nvme_alloc_io_tag_set. This was never properly
done by the fabrics drivers, but they don't seem to care because there
is no actual physical queue that can wrap around, but it became a
problem when converting over the PCIe driver. Also add back the
BLK_MQ_MAX_DEPTH check that was lost in the same commit.

Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Hugh Dickins <hughd@google.com>
Link: https://lore.kernel.org/r/20221225103234.226794-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq

Commit 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'")
will access 'bic->bfqq' in bic_set_bfqq(), however, bfq_exit_icq_bfqq()
can free bfqq first, and then call bic_set_bfqq(), which will cause uaf.

Fix the problem by moving bfq_exit_bfqq() behind bic_set_bfqq().

Fixes: 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20221226030605.1437081-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>