review.tizen.org Git - platform/kernel/linux-rpi.git/log

projects / platform / kernel / linux-rpi.git / log

Dave Airlie [Fri, 13 May 2022 22:34:01 +0000 (08:34 +1000)]

Merge tag 'drm-misc-fixes-2022-05-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Multiple fixes to fbdev to address a regression at unregistration, an
iommu detection improvement for nouveau, a memory leak fix for nouveau,
pointer dereference fix for dma_buf_file_release(), and a build breakage
fix for vc4

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220513073044.ymayac7x7bzatrt7@houat

commit | commitdiff | tree

Dave Airlie [Fri, 13 May 2022 22:29:41 +0000 (08:29 +1000)]

Merge tag 'vmwgfx-drm-fixes-5.18-2022-05-13' of https://gitlab.freedesktop.org/zack/vmwgfx into drm-fixes

vmwgfx fixes for:
- Black screen due to fences using FIFO checks on SVGA3
- Random black screens on boot due to uninitialized drm_mode_fb_cmd2
- Hangs on SVGA3 due to command buffers being used with gbobjects

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a1d32799e4c74b8540216376d7576bb783ca07ba.camel@vmware.com

commit | commitdiff | tree

Zack Rusin [Fri, 18 Mar 2022 17:43:31 +0000 (13:43 -0400)]

drm/vmwgfx: Disable command buffers on svga3 without gbobjects

With very limited vram on svga3 it's difficult to handle all the surface
migrations. Without gbobjects, i.e. the ability to store surfaces in
guest mobs, there's no reason to support intermediate svga2 features,
especially because we can fall back to fb traces and svga3 will never
support those in-between features.

On svga3 we wither want to use fb traces or screen targets
(i.e. gbobjects), nothing in between. This fixes presentation on a lot
of fusion/esxi tech previews where the exposed svga3 caps haven't been
finalized yet.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Fixes: 2cd80dbd3551 ("drm/vmwgfx: Add basic support for SVGA3")
Cc: <stable@vger.kernel.org> # v5.14+
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220318174332.440068-5-zack@kde.org

commit | commitdiff | tree

Zack Rusin [Wed, 2 Mar 2022 15:24:24 +0000 (10:24 -0500)]

drm/vmwgfx: Initialize drm_mode_fb_cmd2

Transition to drm_mode_fb_cmd2 from drm_mode_fb_cmd left the structure
unitialized. drm_mode_fb_cmd2 adds a few additional members, e.g. flags
and modifiers which were never initialized. Garbage in those members
can cause random failures during the bringup of the fbcon.

Initializing the structure fixes random blank screens after bootup due
to flags/modifiers mismatches during the fbcon bring up.

Fixes: dabdcdc9822a ("drm/vmwgfx: Switch to mode_cmd2")
Signed-off-by: Zack Rusin <zackr@vmware.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302152426.885214-7-zack@kde.org

commit | commitdiff | tree

Zack Rusin [Wed, 2 Mar 2022 15:24:22 +0000 (10:24 -0500)]

drm/vmwgfx: Fix fencing on SVGAv3

Port of the vmwgfx to SVGAv3 lacked support for fencing. SVGAv3 removed
FIFO's and replaced them with command buffers and extra registers.
The initial version of SVGAv3 lacked support for most advanced features
(e.g. 3D) which made fences unnecessary. That is no longer the case,
especially as 3D support is being turned on.

Switch from FIFO commands and capabilities to command buffers and extra
registers to enable fences on SVGAv3.

Fixes: 2cd80dbd3551 ("drm/vmwgfx: Add basic support for SVGA3")
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302152426.885214-5-zack@kde.org

commit | commitdiff | tree

Dave Airlie [Fri, 13 May 2022 00:40:55 +0000 (10:40 +1000)]

Merge tag 'amd-drm-fixes-5.18-2022-05-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.18-2022-05-11:

amdgpu:
- Disable ASPM for VI boards on ADL platforms
- S0ix DCN3.1 display fix
- Resume regression fix
- Stable pstate fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220511174422.5769-1-alexander.deucher@amd.com

commit | commitdiff | tree

Dave Airlie [Thu, 12 May 2022 23:24:44 +0000 (09:24 +1000)]

Merge tag 'drm-intel-fixes-2022-05-12' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

Fix for #5732: (Cc stable) kernel memory corruption when running a lot of OpenCL tests in parallel

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YnykW6L4e7vD3yl3@jlahtine-mobl.ger.corp.intel.com

commit | commitdiff | tree

Hui Tang [Tue, 10 May 2022 13:51:48 +0000 (21:51 +0800)]

drm/vc4: hdmi: Fix build error for implicit function declaration

drivers/gpu/drm/vc4/vc4_hdmi.c: In function ‘vc4_hdmi_connector_detect’:
drivers/gpu/drm/vc4/vc4_hdmi.c:228:7: error: implicit declaration of function ‘gpiod_get_value_cansleep’; did you mean ‘gpio_get_value_cansleep’? [-Werror=implicit-function-declaration]
   if (gpiod_get_value_cansleep(vc4_hdmi->hpd_gpio))
       ^~~~~~~~~~~~~~~~~~~~~~~~
       gpio_get_value_cansleep
  CC [M]  drivers/gpu/drm/vc4/vc4_validate.o
  CC [M]  drivers/gpu/drm/vc4/vc4_v3d.o
  CC [M]  drivers/gpu/drm/vc4/vc4_validate_shaders.o
  CC [M]  drivers/gpu/drm/vc4/vc4_debugfs.o
drivers/gpu/drm/vc4/vc4_hdmi.c: In function ‘vc4_hdmi_bind’:
drivers/gpu/drm/vc4/vc4_hdmi.c:2883:23: error: implicit declaration of function ‘devm_gpiod_get_optional’; did you mean ‘devm_clk_get_optional’? [-Werror=implicit-function-declaration]
  vc4_hdmi->hpd_gpio = devm_gpiod_get_optional(dev, "hpd", GPIOD_IN);
                       ^~~~~~~~~~~~~~~~~~~~~~~
                       devm_clk_get_optional
drivers/gpu/drm/vc4/vc4_hdmi.c:2883:59: error: ‘GPIOD_IN’ undeclared (first use in this function); did you mean ‘GPIOF_IN’?
  vc4_hdmi->hpd_gpio = devm_gpiod_get_optional(dev, "hpd", GPIOD_IN);
                                                           ^~~~~~~~
                                                           GPIOF_IN
drivers/gpu/drm/vc4/vc4_hdmi.c:2883:59: note: each undeclared identifier is reported only once for each function it appears in
cc1: all warnings being treated as errors

Fixes: 6800234ceee0 ("drm/vc4: hdmi: Convert to gpiod")
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220510135148.247719-1-tanghui20@huawei.com

commit | commitdiff | tree

Maarten Lankhorst [Wed, 11 May 2022 18:22:22 +0000 (20:22 +0200)]

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

Requested by Zack for vmwgfx fixes.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

commit | commitdiff | tree

Alex Deucher [Tue, 10 May 2022 14:32:26 +0000 (10:32 -0400)]

drm/amdgpu/ctx: only reset stable pstate if the user changed it (v2)

Check if the requested stable pstate matches the current one before
changing it. This avoids changing the stable pstate on context
destroy if the user never changed it in the first place via the
IOCTL.

v2: compare the current and requested rather than setting a flag (Lijo)

Fixes: 8cda7a4f96e435 ("drm/amdgpu/UAPI: add new CTX OP to get/set stable pstates")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

commit | commitdiff | tree

Alex Deucher [Tue, 10 May 2022 13:37:06 +0000 (09:37 -0400)]

Revert "drm/amd/pm: keep the BACO feature enabled for suspend"

This reverts commit eaa090538e8d21801c6d5f94590c3799e6a528b5.

Commit ebc002e3ee78 ("drm/amdgpu: don't use BACO for reset in S3")
stops using BACO for reset during suspend, so it's no longer
necessary to leave BACO enabled during suspend. This fixes
resume from suspend on the navy flounder dGPU in the ASUS ROG
Strix G513QY.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2008
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1982
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

commit | commitdiff | tree

Charan Teja Reddy [Mon, 9 May 2022 19:49:57 +0000 (01:19 +0530)]

dma-buf: call dma_buf_stats_setup after dmabuf is in valid list

When dma_buf_stats_setup() fails, it closes the dmabuf file which
results into the calling of dma_buf_file_release() where it does
list_del(&dmabuf->list_node) with out first adding it to the proper
list. This is resulting into panic in the below path:
__list_del_entry_valid+0x38/0xac
dma_buf_file_release+0x74/0x158
__fput+0xf4/0x428
____fput+0x14/0x24
task_work_run+0x178/0x24c
do_notify_resume+0x194/0x264
work_pending+0xc/0x5f0

Fix it by moving the dma_buf_stats_setup() after dmabuf is added to the
list.

Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
Signed-off-by: Charan Teja Reddy <quic_charante@quicinc.com>
Tested-by: T.J. Mercier <tjmercier@google.com>
Acked-by: T.J. Mercier <tjmercier@google.com>
Cc: <stable@vger.kernel.org> # 5.15.x+
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1652125797-2043-1-git-send-email-quic_charante@quicinc.com

commit | commitdiff | tree

Karol Herbst [Wed, 20 Apr 2022 09:57:20 +0000 (11:57 +0200)]

drm/i915: Fix race in __i915_vma_remove_closed

i915_vma_reopen checked if the vma is closed before without taking the
lock. So multiple threads could attempt removing the vma.

Instead the lock needs to be taken before actually checking.

v2: move struct declaration

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.3+
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5732
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Fixes: 155ab8836caa ("drm/i915: Move object close under its own lock")
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220420095720.3331609-1-kherbst@redhat.com
(cherry picked from commit 1df1c79cbb7ac9bf148930be3418973c76ba8dde)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 20:54:17 +0000 (13:54 -0700)]

Linux 5.18-rc6

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 19:42:05 +0000 (12:42 -0700)]

Merge tag 'for-5.18/parisc-3' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc architecture fixes from Helge Deller:
"Some reverts of existing patches, which were necessary because of boot
  issues due to wrong CPU clock handling and cache issues which led to
  userspace segfaults with 32bit kernels. Dave has a whole bunch of
  upcoming cache fixes which I then plan to push in the next merge
  window.

  Other than that just small updates and fixes, e.g. defconfig updates,
  spelling fixes, a clocksource fix, boot topology fixes and a fix for
  /proc/cpuinfo output to satisfy lscpu"

* tag 'for-5.18/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  Revert "parisc: Increase parisc_cache_flush_threshold setting"
  parisc: Mark cr16 clock unstable on all SMP machines
  parisc: Fix typos in comments
  parisc: Change MAX_ADDRESS to become unsigned long long
  parisc: Merge model and model name into one line in /proc/cpuinfo
  parisc: Re-enable GENERIC_CPU_DEVICES for !SMP
  parisc: Update 32- and 64-bit defconfigs
  parisc: Only list existing CPUs in cpu_possible_mask
  Revert "parisc: Fix patch code locking and flushing"
  Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"
  Revert "parisc: Mark cr16 CPU clocksource unstable on all SMP machines"

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 18:38:23 +0000 (11:38 -0700)]

Merge tag 'powerpc-5.18-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Fix the DWARF CFI in our VDSO time functions, allowing gdb to
   backtrace through them correctly.

- Fix a buffer overflow in the papr_scm driver, only triggerable by
   hypervisor input.

- A fix in the recently added QoS handling for VAS (used for
   communicating with coprocessors).

Thanks to Alan Modra, Haren Myneni, Kajol Jain, and Segher Boessenkool.

* tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE
  powerpc/vdso: Fix incorrect CFI in gettimeofday.S
  powerpc/pseries/vas: Use QoS credits from the userspace

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 18:21:54 +0000 (11:21 -0700)]

Merge tag 'x86-urgent-2022-05-08' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
"A fix and an email address update:

   - Prevent FPU state corruption.

     The condition in irq_fpu_usable() grants FPU usage when the FPU is
     not used in the kernel. That's just wrong as it does not take the
     fpregs_lock()'ed regions into account. If FPU usage happens within
     such a region from interrupt context, then the FPU state gets
     corrupted.

     That's a long standing bug, which got unearthed by the recent
     changes to the random code.

   - Josh wants to use his kernel.org email address"

* tag 'x86-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Prevent FPU state corruption
  MAINTAINERS: Update Josh Poimboeuf's email address

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 18:18:11 +0000 (11:18 -0700)]

Merge tag 'timers-urgent-2022-05-08' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
"A fix and an email address update:

   - Mark the NMI safe time accessors notrace to prevent tracer
     recursion when they are selected as trace clocks.

   - John Stultz has a new email address"

* tag 'timers-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Mark NMI safe time accessors as notrace
  MAINTAINERS: Update email address for John Stultz

commit | commitdiff | tree

Helge Deller [Sun, 8 May 2022 17:55:13 +0000 (19:55 +0200)]

Revert "parisc: Increase parisc_cache_flush_threshold setting"

This reverts commit a58e9d0984e8dad53f17ec73ae3c1cc7f8d88151.

Triggers segfaults with 32-bit kernels on PA8500 machines.

Signed-off-by: Helge Deller <deller@gmx.de>

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 18:10:17 +0000 (11:10 -0700)]

Merge tag 'irq-urgent-2022-05-08' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
"A fix for the threaded interrupt core.

  A quick sequence of request/free_irq() can result in a hang because
  the interrupt thread did not reach the thread function and got stopped
  in the kthread core already. That leaves a state active counter
  arround which makes a invocation of synchronized_irq() on that
  interrupt hang forever.

  Ensure that the thread reached the thread function in request_irq() to
  prevent that"

* tag 'irq-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Synchronize interrupt thread startup

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 18:01:20 +0000 (11:01 -0700)]

Merge tag 'locking-urgent-2022-05-08' of git://git./linux/kernel/git/tip/tip

Pull locking fixlet from Thomas Gleixner:
"Just a email address update for MAINTAINERS and mailmap"

* tag 'locking-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: MAINTAINERS, .mailmap: Update André's email address

commit | commitdiff | tree

Helge Deller [Sun, 8 May 2022 16:25:00 +0000 (18:25 +0200)]

parisc: Mark cr16 clock unstable on all SMP machines

The cr16 interval timers are not synchronized across CPUs, even with just
one dual-core CPU. This becomes visible if the machines have a longer
uptime.

Signed-off-by: Helge Deller <deller@gmx.de>

commit | commitdiff | tree

Julia Lawall [Sat, 30 Apr 2022 19:07:18 +0000 (21:07 +0200)]

parisc: Fix typos in comments

Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Helge Deller <deller@gmx.de>

commit | commitdiff | tree

Helge Deller [Tue, 5 Apr 2022 19:28:37 +0000 (21:28 +0200)]

parisc: Change MAX_ADDRESS to become unsigned long long

Dave noticed that for the 32-bit kernel MAX_ADDRESS should be a ULL,
otherwise this define would become 0:
MAX_ADDRESS (1UL << MAX_ADDRBITS)
It has no real effect on the kernel.

Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>

commit | commitdiff | tree

Helge Deller [Sun, 3 Apr 2022 19:57:51 +0000 (21:57 +0200)]

parisc: Merge model and model name into one line in /proc/cpuinfo

The Linux tool "lscpu" shows the double amount of CPUs if we have
"model" and "model name" in two different lines in /proc/cpuinfo.
This change combines the model and the model name into one line.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org

commit | commitdiff | tree

Helge Deller [Fri, 1 Apr 2022 20:24:20 +0000 (22:24 +0200)]

parisc: Re-enable GENERIC_CPU_DEVICES for !SMP

In commit 62773112acc5 ("parisc: Switch from GENERIC_CPU_DEVICES to
GENERIC_ARCH_TOPOLOGY") GENERIC_CPU_DEVICES was unconditionally turned
off, but this triggers a warning in topology_add_dev(). Turning it back
on for the !SMP case avoids this warning.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 62773112acc5 ("parisc: Switch from GENERIC_CPU_DEVICES to GENERIC_ARCH_TOPOLOGY")
Signed-off-by: Helge Deller <deller@gmx.de>

commit | commitdiff | tree

Helge Deller [Fri, 1 Apr 2022 16:55:27 +0000 (18:55 +0200)]

parisc: Update 32- and 64-bit defconfigs

Enable CONFIG_CGROUPS=y on 32-bit defconfig for systemd-support, and
enable CONFIG_NAMESPACES and CONFIG_USER_NS.

Signed-off-by: Helge Deller <deller@gmx.de>

commit | commitdiff | tree

Helge Deller [Fri, 1 Apr 2022 07:19:11 +0000 (09:19 +0200)]

parisc: Only list existing CPUs in cpu_possible_mask

The inventory knows which CPUs are in the system, so this bitmask should
be in cpu_possible_mask instead of the bitmask based on CONFIG_NR_CPUS.

Reset the cpu_possible_mask before scanning the system for CPUs, and
mark each existing CPU as possible during initialization of that CPU.

This avoids those warnings later on too:

register_cpu_capacity_sysctl: too early to get CPU4 device!

Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>

commit | commitdiff | tree

Helge Deller [Sun, 8 May 2022 08:18:40 +0000 (10:18 +0200)]

Revert "parisc: Fix patch code locking and flushing"

This reverts commit a9fe7fa7d874a536e0540469f314772c054a0323.

Leads to segfaults on 32bit kernel.

Signed-off-by: Helge Deller <deller@gmx.de>

commit | commitdiff | tree

Helge Deller [Sat, 7 May 2022 13:32:38 +0000 (15:32 +0200)]

Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"

This reverts commit d97180ad68bdb7ee10f327205a649bc2f558741d.

It triggers RCU stalls at boot with a 32-bit kernel.

Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v5.15+

commit | commitdiff | tree

Helge Deller [Sat, 7 May 2022 13:31:16 +0000 (15:31 +0200)]

Revert "parisc: Mark cr16 CPU clocksource unstable on all SMP machines"

This reverts commit afdb4a5b1d340e4afffc65daa21cc71890d7d589.

It triggers RCU stalls at boot with a 32-bit kernel.

Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v5.16+

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 17:28:22 +0000 (10:28 -0700)]

Merge tag 'core-urgent-2022-05-08' of git://git./linux/kernel/git/tip/tip

Pull PASID fix from Thomas Gleixner:
"A single bugfix for the PASID management code, which freed the PASID
  too early. The PASID needs to be tied to the mm lifetime, not to the
  address space lifetime"

* tag 'core-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mm: Fix PASID use-after-free issue

commit | commitdiff | tree

Linus Torvalds [Sun, 8 May 2022 17:10:51 +0000 (10:10 -0700)]

Merge tag 'sound-5.18-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This became slightly larger as I've been off in the last weeks.

  The majority of changes here is about ASoC, fixes for dmaengine
  and for addressing issues reported by CI, as well as other
  device-specific small fixes.

  Also, fixes for FireWire core stack and the usual HD-audio quirks
  are included"

* tag 'sound-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits)
  ASoC: SOF: Fix NULL pointer exception in sof_pci_probe callback
  ASoC: ops: Validate input values in snd_soc_put_volsw_range()
  ASoC: dmaengine: Restore NULL prepare_slave_config() callback
  ASoC: atmel: mchp-pdmc: set prepare_slave_config
  ASoC: max98090: Generate notifications on changes for custom control
  ASoC: max98090: Reject invalid values in custom control put()
  ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes
  ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers
  firewire: core: extend card->lock in fw_core_handle_bus_reset
  firewire: remove check of list iterator against head past the loop body
  firewire: fix potential uaf in outbound_phy_packet_callback()
  ASoC: rt9120: Correct the reg 0x09 size to one byte
  ALSA: hda/realtek: Enable mute/micmute LEDs support for HP Laptops
  ALSA: hda/realtek: Fix mute led issue on thinkpad with cs35l41 s-codec
  ASoC: meson: axg-card: Fix nonatomic links
  ASoC: meson: axg-tdm-interface: Fix formatters in trigger"
  ASoC: soc-ops: fix error handling
  ASoC: meson: Fix event generation for G12A tohdmi mux
  ASoC: meson: Fix event generation for AUI CODEC mux
  ASoC: meson: Fix event generation for AUI ACODEC mux
  ...

commit | commitdiff | tree

Willy Tarreau [Sun, 8 May 2022 09:37:09 +0000 (11:37 +0200)]

blk-mq: remove the error_count from struct request

The last two users were floppy.c and ataflop.c respectively, it was
verified that no other drivers makes use of this, so let's remove it.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Minh Yuan <yuanmingbuaa@gmail.com>
Cc: Denis Efremov <efremov@linux.com>,
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Willy Tarreau [Sun, 8 May 2022 09:37:08 +0000 (11:37 +0200)]

ataflop: use a statically allocated error counters

This is the last driver making use of fd_request->error_count, which is
easy to get wrong as was shown in floppy.c. We don't need to keep it
there, it can be moved to the atari_floppy_struct instead, so let's do
this.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Minh Yuan <yuanmingbuaa@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Willy Tarreau [Sun, 8 May 2022 09:37:07 +0000 (11:37 +0200)]

floppy: use a statically allocated error counter

Interrupt handler bad_flp_intr() may cause a UAF on the recently freed
request just to increment the error count.  There's no point keeping
that one in the request anyway, and since the interrupt handler uses a
static pointer to the error which cannot be kept in sync with the
pending request, better make it use a static error counter that's reset
for each new request.  This reset now happens when entering
redo_fd_request() for a new request via set_next_request().

One initial concern about a single error counter was that errors on one
floppy drive could be reported on another one, but this problem is not
real given that the driver uses a single drive at a time, as that
PC-compatible controllers also have this limitation by using shared
signals.  As such the error count is always for the "current" drive.

Reported-by: Minh Yuan <yuanmingbuaa@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Tested-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Takashi Iwai [Sun, 8 May 2022 08:49:25 +0000 (10:49 +0200)]

Merge tag 'asoc-fix-v5.18-rc4' of https://git./linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.18

A larger collection of fixes than I'd like, mainly because mixer-test
is making it's way into the CI systems and turning up issues on a wider
range of systems. The most substantial thing though is a revert and an
alternative fix for a dmaengine issue where the fix caused disruption
for some other configurations, the core fix is backed out an a driver
specific thing done instead.

commit | commitdiff | tree

Linus Torvalds [Sat, 7 May 2022 18:02:02 +0000 (11:02 -0700)]

Merge tag 'gpio-fixes-for-v5.18-rc6' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix the bounds check for the 'gpio-reserved-ranges' device property
   in gpiolib-of

- drop the assignment of the pwm base number in gpio-mvebu (this was
   missed by the patch doing it globally for all pwm drivers)

- fix the fwnode assignment (use own fwnode, not the parent's one) for
   the GPIO irqchip in gpio-visconti

- update the irq_stat field before checking the trigger field in
   gpio-pca953x

- update GPIO entry in MAINTAINERS

* tag 'gpio-fixes-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)
  gpio: visconti: Fix fwnode of GPIO IRQ
  MAINTAINERS: update the GPIO git tree entry
  gpio: mvebu: drop pwm base assignment
  gpiolib: of: fix bounds check for 'gpio-reserved-ranges'

commit | commitdiff | tree

Linus Torvalds [Sat, 7 May 2022 17:47:51 +0000 (10:47 -0700)]

Merge tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"A single revert for a change that isn't needed in 5.18, and a small
  series for s390/dasd"

* tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-block:
  s390/dasd: Use kzalloc instead of kmalloc/memset
  s390/dasd: Fix read inconsistency for ESE DASD devices
  s390/dasd: Fix read for ESE with blksize < 4k
  s390/dasd: prevent double format of tracks for ESE devices
  s390/dasd: fix data corruption for ESE devices
  Revert "block: release rq qos structures for queue without disk"

commit | commitdiff | tree

Linus Torvalds [Sat, 7 May 2022 17:41:41 +0000 (10:41 -0700)]

Merge tag 'io_uring-5.18-2022-05-06' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
"Just a single file assignment fix this week"

* tag 'io_uring-5.18-2022-05-06' of git://git.kernel.dk/linux-block:
io_uring: assign non-fixed early for async work

commit | commitdiff | tree

Javier Martinez Canillas [Fri, 6 May 2022 13:22:25 +0000 (15:22 +0200)]

fbdev: efifb: Fix a use-after-free due early fb_info cleanup

Commit d258d00fb9c7 ("fbdev: efifb: Cleanup fb_info in .fb_destroy rather
than .remove") attempted to fix a use-after-free error due driver freeing
the fb_info in the .remove handler instead of doing it in .fb_destroy.

But ironically that change introduced yet another use-after-free since the
fb_info was still used after the free.

This should fix for good by freeing the fb_info at the end of the handler.

Fixes: d258d00fb9c7 ("fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reported-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Thomas Zimmermann <tzimemrmann@suse.de>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220506132225.588379-1-javierm@redhat.com

commit | commitdiff | tree

Christophe JAILLET [Wed, 9 Feb 2022 06:03:11 +0000 (07:03 +0100)]

drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()

If successful ida_simple_get() calls are not undone when needed, some
additional memory may be allocated and wasted.

Here, an ID between 0 and MAX_INT is required. If this ID is >=100, it is
not taken into account and is wasted. It should be released.

Instead of calling ida_simple_remove(), take advantage of the 'max'
parameter to require the ID not to be too big. Should it be too big, it
is not allocated and don't need to be freed.

While at it, use ida_alloc_xxx()/ida_free() instead to
ida_simple_get()/ida_simple_remove().
The latter is deprecated and more verbose.

Fixes: db1a0ae21461 ("drm/nouveau/bl: Assign different names to interfaces")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[Fixed formatting warning from checkpatch]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9ba85bca59df6813dc029e743a836451d5173221.1644386541.git.christophe.jaillet@wanadoo.fr

commit | commitdiff | tree

Linus Torvalds [Fri, 6 May 2022 21:32:16 +0000 (14:32 -0700)]

Merge tag 'for-5.18-rc5-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"Regression fixes in zone activation:

   - move a loop invariant out of the loop to avoid checking space
     status

   - properly handle unlimited activation

  Other fixes:

   - for subpage, force the free space v2 mount to avoid a warning and
     make it easy to switch a filesystem on different page size systems

   - export sysfs status of exclusive operation 'balance paused', so the
     user space tools can recognize it and allow adding a device with
     paused balance

   - fix assertion failure when logging directory key range item"

* tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: sysfs: export the balance paused state of exclusive operation
  btrfs: fix assertion failure when logging directory key range item
  btrfs: zoned: activate block group properly on unlimited active zone device
  btrfs: zoned: move non-changing condition check out of the loop
  btrfs: force v2 space cache usage for subpage mount

commit | commitdiff | tree

Robin Murphy [Tue, 5 Apr 2022 14:21:34 +0000 (15:21 +0100)]

drm/nouveau/tegra: Stop using iommu_present()

Even if some IOMMU has registered itself on the platform "bus", that
doesn't necessarily mean it provides translation for the device we
care about. Replace iommu_present() with a more appropriate check.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[added cc for stable]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org # v5.0+
Link: https://patchwork.freedesktop.org/patch/msgid/70d40ea441da3663c2824d54102b471e9a621f8a.1649168494.git.robin.murphy@arm.com

commit | commitdiff | tree

Linus Torvalds [Fri, 6 May 2022 20:19:11 +0000 (13:19 -0700)]

Merge tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
"Highlights include:

  Stable fixes:

   - Fix a socket leak when setting up an AF_LOCAL RPC client

   - Ensure that knfsd connects to the gss-proxy daemon on setup

  Bugfixes:

   - Fix a refcount leak when migrating a task off an offlined transport

   - Don't gratuitously invalidate inode attributes on delegation return

   - Don't leak sockets in xs_local_connect()

   - Ensure timely close of disconnected AF_LOCAL sockets"

* tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  Revert "SUNRPC: attempt AF_LOCAL connect on setup"
  SUNRPC: Ensure gss-proxy connects on setup
  SUNRPC: Ensure timely close of disconnected AF_LOCAL sockets
  SUNRPC: Don't leak sockets in xs_local_connect()
  NFSv4: Don't invalidate inode attributes on delegation return
  SUNRPC release the transport of a relocated task with an assigned transport

commit | commitdiff | tree

Linus Torvalds [Fri, 6 May 2022 18:42:58 +0000 (11:42 -0700)]

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"x86:

   - Account for family 17h event renumberings in AMD PMU emulation

   - Remove CPUID leaf 0xA on AMD processors

   - Fix lockdep issue with locking all vCPUs

   - Fix loss of A/D bits in SPTEs

   - Fix syzkaller issue with invalid guest state"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: Exit to userspace if vCPU has injected exception and invalid state
  KVM: SEV: Mark nested locking of vcpu->lock
  kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU
  KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id
  KVM: x86/mmu: Use atomic XCHG to write TDP MMU SPTEs with volatile bits
  KVM: x86/mmu: Move shadow-present check out of spte_has_volatile_bits()
  KVM: x86/mmu: Don't treat fully writable SPTEs as volatile (modulo A/D)

commit | commitdiff | tree

Linus Torvalds [Fri, 6 May 2022 18:30:59 +0000 (11:30 -0700)]

Merge tag 'riscv-for-linus-5.18-rc6' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fix from Palmer Dabbelt:

- A fix to relocate the DTB early in boot, in cases where the
   bootloader doesn't put the DTB in a region that will end up
   mapped by the kernel.

   This manifests as a crash early in boot on a handful of
   configurations.

* tag 'riscv-for-linus-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: relocate DTB if it's outside memory region

commit | commitdiff | tree

Sean Christopherson [Mon, 2 May 2022 22:18:50 +0000 (22:18 +0000)]

KVM: VMX: Exit to userspace if vCPU has injected exception and invalid state

Exit to userspace with an emulation error if KVM encounters an injected
exception with invalid guest state, in addition to the existing check of
bailing if there's a pending exception (KVM doesn't support emulating
exceptions except when emulating real mode via vm86).

In theory, KVM should never get to such a situation as KVM is supposed to
exit to userspace before injecting an exception with invalid guest state.
But in practice, userspace can intervene and manually inject an exception
and/or stuff registers to force invalid guest state while a previously
injected exception is awaiting reinjection.

Fixes: fc4fad79fc3d ("KVM: VMX: Reject KVM_RUN if emulation is required with pending exception")
Reported-by: syzbot+cfafed3bb76d3e37581b@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220502221850.131873-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Peter Gonda [Mon, 2 May 2022 16:58:07 +0000 (09:58 -0700)]

KVM: SEV: Mark nested locking of vcpu->lock

svm_vm_migrate_from() uses sev_lock_vcpus_for_migration() to lock all
source and target vcpu->locks. Unfortunately there is an 8 subclass
limit, so a new subclass cannot be used for each vCPU. Instead maintain
ownership of the first vcpu's mutex.dep_map using a role specific
subclass: source vs target. Release the other vcpu's mutex.dep_maps.

Fixes: b56639318bb2b ("KVM: SEV: Add support for SEV intra host migration")
Reported-by: John Sperbeck<jsperbeck@google.com>
Suggested-by: David Rientjes <rientjes@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Gonda <pgonda@google.com>
Message-Id: <20220502165807.529624-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Linus Torvalds [Fri, 6 May 2022 16:50:25 +0000 (09:50 -0700)]

Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"A few recent regressions in rxe's multicast code, and some old driver
  bugs:

   - Error case unwind bug in rxe for rkeys

   - Dot not call netdev functions under a spinlock in rxe multicast
     code

   - Use the proper BH lock type in rxe multicast code

   - Fix idrma deadlock and crash

   - Add a missing flush to drain irdma QPs when in error

   - Fix high userspace latency in irdma during destroy due to
     synchronize_rcu()

   - Rare race in siw MPA processing"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rxe: Change mcg_lock to a _bh lock
  RDMA/rxe: Do not call  dev_mc_add/del() under a spinlock
  RDMA/siw: Fix a condition race issue in MPA request processing
  RDMA/irdma: Fix possible crash due to NULL netdev in notifier
  RDMA/irdma: Reduce iWARP QP destroy time
  RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
  RDMA/rxe: Recheck the MR in when generating a READ reply
  RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core()
  RDMA/rxe: Fix "Replace mr by rkey in responder resources"

commit | commitdiff | tree

Linus Torvalds [Fri, 6 May 2022 16:45:44 +0000 (09:45 -0700)]

Merge tag 'mmc-v5.18-rc4' of git://git./linux/kernel/git/ulfh/mmc

Pull mmc fixes from Ulf Hansson:
"MMC core:

   - Fix initialization for eMMC's HS200/HS400 mode

  MMC host:

   - sdhci-msm: Reset GCC_SDCC_BCR register to prevent timeout issues

   - sunxi-mmc: Fix DMA descriptors allocated above 32 bits"

* tag 'mmc-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC
  mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits
  mmc: core: Set HS clock speed before sending HS CMD13

commit | commitdiff | tree

Linus Torvalds [Fri, 6 May 2022 16:33:28 +0000 (09:33 -0700)]

Merge tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"A pretty quiet week, one fbdev, msm, kconfig, and two amdgpu fixes,
  about what I'd expect for rc6.

  fbdev:

   - hotunplugging fix

  amdgpu:

   - Fix a xen dom0 regression on APUs

   - Fix a potential array overflow if a receiver were to send an
     erroneous audio channel count

  msm:

   - lockdep fix.

  it6505:

   - kconfig fix"

* tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT
  drm/amdgpu: do not use passthrough mode in Xen dom0
  drm/bridge: ite-it6505: add missing Kconfig option select
  fbdev: Make fb_release() return -ENODEV if fbdev was unregistered
  drm/msm/dp: remove fail safe mode related code

commit | commitdiff | tree

Puyou Lu [Fri, 6 May 2022 08:06:30 +0000 (16:06 +0800)]

gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)

When one port's input state get inverted (eg. from low to hight) after
pca953x_irq_setup but before setting irq_mask (by some other driver such as
"gpio-keys"), the next inversion of this port (eg. from hight to low) will not
be triggered any more (because irq_stat is not updated at the first time). Issue
should be fixed after this commit.

Fixes: 89ea8bbe9c3e ("gpio: pca953x.c: add interrupt handling capability")
Signed-off-by: Puyou Lu <puyou.lu@gmail.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>

commit | commitdiff | tree

Eric Yang [Sat, 19 Mar 2022 20:34:24 +0000 (16:34 -0400)]

drm/amd/display: undo clearing of z10 related function pointers

[Why]
Z10 and S0i3 have some shared path. Previous code clean up ,
incorrectly removed these pointers, which breaks s0i3 restore

[How]
Do not clear the function pointers based on Z10 disable.

Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Richard Gong [Fri, 8 Apr 2022 17:08:38 +0000 (12:08 -0500)]

drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems

Active State Power Management (ASPM) feature is enabled since kernel 5.14.
There are some AMD Volcanic Islands (VI) GFX cards, such as the WX3200 and
RX640, that do not work with ASPM-enabled Intel Alder Lake based systems.
Using these GFX cards as video/display output, Intel Alder Lake based
systems will freeze after suspend/resume.

The issue was originally reported on one system (Dell Precision 3660 with
BIOS version 0.14.81), but was later confirmed to affect at least 4
pre-production Alder Lake based systems.

Add an extra check to disable ASPM on Intel Alder Lake based systems with
the problematic AMD Volcanic Islands GFX cards.

Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
Signed-off-by: Richard Gong <richard.gong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

commit | commitdiff | tree

Javier Martinez Canillas [Thu, 5 May 2022 22:06:31 +0000 (00:06 +0200)]

fbdev: vesafb: Cleanup fb_info in .fb_destroy rather than .remove

The driver is calling framebuffer_release() in its .remove callback, but
this will cause the struct fb_info to be freed too early. Since it could
be that a reference is still hold to it if user-space opened the fbdev.

This would lead to a use-after-free error if the framebuffer device was
unregistered but later a user-space process tries to close the fbdev fd.

To prevent this, move the framebuffer_release() call to fb_ops.fb_destroy
instead of doing it in the driver's .remove callback.

Strictly speaking, the code flow in the driver is still wrong because all
the hardware cleanupd (i.e: iounmap) should be done in .remove while the
software cleanup (i.e: releasing the framebuffer) should be done in the
.fb_destroy handler. But this at least makes to match the behavior before
commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal").

Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220505220631.366371-1-javierm@redhat.com

commit | commitdiff | tree

Javier Martinez Canillas [Thu, 5 May 2022 22:05:40 +0000 (00:05 +0200)]

fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove

The driver is calling framebuffer_release() in its .remove callback, but
this will cause the struct fb_info to be freed too early. Since it could
be that a reference is still hold to it if user-space opened the fbdev.

This would lead to a use-after-free error if the framebuffer device was
unregistered but later a user-space process tries to close the fbdev fd.

To prevent this, move the framebuffer_release() call to fb_ops.fb_destroy
instead of doing it in the driver's .remove callback.

Strictly speaking, the code flow in the driver is still wrong because all
the hardware cleanupd (i.e: iounmap) should be done in .remove while the
software cleanup (i.e: releasing the framebuffer) should be done in the
.fb_destroy handler. But this at least makes to match the behavior before
commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal").

Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220505220540.366218-1-javierm@redhat.com

commit | commitdiff | tree

Javier Martinez Canillas [Thu, 5 May 2022 22:04:56 +0000 (00:04 +0200)]

fbdev: simplefb: Cleanup fb_info in .fb_destroy rather than .remove

The driver is calling framebuffer_release() in its .remove callback, but
this will cause the struct fb_info to be freed too early. Since it could
be that a reference is still hold to it if user-space opened the fbdev.

This would lead to a use-after-free error if the framebuffer device was
unregistered but later a user-space process tries to close the fbdev fd.

To prevent this, move the framebuffer_release() call to fb_ops.fb_destroy
instead of doing it in the driver's .remove callback.

Strictly speaking, the code flow in the driver is still wrong because all
the hardware cleanupd (i.e: iounmap) should be done in .remove while the
software cleanup (i.e: releasing the framebuffer) should be done in the
.fb_destroy handler. But this at least makes to match the behavior before
commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal").

Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220505220456.366090-1-javierm@redhat.com

commit | commitdiff | tree

Daniel Vetter [Thu, 5 May 2022 22:04:13 +0000 (00:04 +0200)]

fbdev: Prevent possible use-after-free in fb_release()

Most fbdev drivers have issues with the fb_info lifetime, because call to
framebuffer_release() from their driver's .remove callback, rather than
doing from fbops.fb_destroy callback.

Doing that will destroy the fb_info too early, while references to it may
still exist, leading to a use-after-free error.

To prevent this, check the fb_info reference counter when attempting to
kfree the data structure in framebuffer_release(). That will leak it but
at least will prevent the mentioned error.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220505220413.365977-1-javierm@redhat.com

commit | commitdiff | tree

Javier Martinez Canillas [Wed, 4 May 2022 11:59:17 +0000 (13:59 +0200)]

Revert "fbdev: Make fb_release() return -ENODEV if fbdev was unregistered"

This reverts commit aafa025c76dcc7d1a8c8f0bdefcbe4eb480b2f6a. That commit
attempted to fix a NULL pointer dereference, caused by the struct fb_info
associated with a framebuffer device to not longer be valid when the file
descriptor was closed.

The issue was exposed by commit 27599aacbaef ("fbdev: Hot-unplug firmware
fb devices on forced removal"), which added a new path that goes through
the struct device removal instead of directly unregistering the fb.

Most fbdev drivers have issues with the fb_info lifetime, because call to
framebuffer_release() from their driver's .remove callback, rather than
doing from fbops.fb_destroy callback. This meant that due to this switch,
the fb_info was now destroyed too early, while references still existed,
while before it was simply leaked.

The patch we're reverting here reinstated that leak, hence "fixed" the
regression. But the proper solution is to fix the drivers to not release
the fb_info too soon.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220504115917.758787-1-javierm@redhat.com

commit | commitdiff | tree

Kajol Jain [Thu, 5 May 2022 15:34:51 +0000 (21:04 +0530)]

powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE

With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
dynamic checks for string size which can panic the kernel, like incase
of overflow detection.

In papr_scm, papr_scm_pmu_check_events function uses stat->stat_id with
string operations, to populate the nvdimm_events_map array. Since
stat_id variable is not NULL terminated, the kernel panics with
CONFIG_FORTIFY_SOURCE enabled at boot time.

Below are the logs of kernel panic:

  detected buffer overflow in __fortify_strlen
  ------------[ cut here ]------------
  kernel BUG at lib/string_helpers.c:980!
  Oops: Exception in kernel mode, sig: 5 [#1]
  NIP [c00000000077dad0] fortify_panic+0x28/0x38
  LR [c00000000077dacc] fortify_panic+0x24/0x38
  Call Trace:
  [c0000022d77836e0] [c00000000077dacc] fortify_panic+0x24/0x38 (unreliable)
  [c00800000deb2660] papr_scm_pmu_check_events.constprop.0+0x118/0x220 [papr_scm]
  [c00800000deb2cb0] papr_scm_probe+0x288/0x62c [papr_scm]
  [c0000000009b46a8] platform_probe+0x98/0x150

Fix this issue by using kmemdup_nul() to copy the content of
stat->stat_id directly to the nvdimm_events_map array.

mpe: stat->stat_id comes from the hypervisor, not userspace, so there is
no security exposure.

Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220505153451.35503-1-kjain@linux.ibm.com

commit | commitdiff | tree

Haowen Bai [Thu, 5 May 2022 14:17:33 +0000 (16:17 +0200)]

s390/dasd: Use kzalloc instead of kmalloc/memset

Use kzalloc rather than duplicating its implementation, which
makes code simple and easy to understand.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20220505141733.1989450-6-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jan Höppner [Thu, 5 May 2022 14:17:32 +0000 (16:17 +0200)]

s390/dasd: Fix read inconsistency for ESE DASD devices

Read requests that return with NRF error are partially completed in
dasd_eckd_ese_read(). The function keeps track of the amount of
processed bytes and the driver will eventually return this information
back to the block layer for further processing via __dasd_cleanup_cqr()
when the request is in the final stage of processing (from the driver's
perspective).

For this, blk_update_request() is used which requires the number of
bytes to complete the request. As per documentation the nr_bytes
parameter is described as follows:
"number of bytes to complete for @req".

This was mistakenly interpreted as "number of bytes _left_ for @req"
leading to new requests with incorrect data length. The consequence are
inconsistent and completely wrong read requests as data from random
memory areas are read back.

Fix this by correctly specifying the amount of bytes that should be used
to complete the request.

Fixes: 5e6bdd37c552 ("s390/dasd: fix data corruption for thin provisioned devices")
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20220505141733.1989450-5-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jan Höppner [Thu, 5 May 2022 14:17:31 +0000 (16:17 +0200)]

s390/dasd: Fix read for ESE with blksize < 4k

When reading unformatted tracks on ESE devices, the corresponding memory
areas are simply set to zero for each segment. This is done incorrectly
for blocksizes < 4096.

There are two problems. First, the increment of dst is done using the
counter of the loop (off), which is increased by blksize every
iteration. This leads to a much bigger increment for dst as actually
intended. Second, the increment of dst is done before the memory area
is set to 0, skipping a significant amount of bytes of memory.

This leads to illegal overwriting of memory and ultimately to a kernel
panic.

This is not a problem with 4k blocksize because
blk_queue_max_segment_size is set to PAGE_SIZE, always resulting in a
single iteration for the inner segment loop (bv.bv_len == blksize). The
incorrectly used 'off' value to increment dst is 0 and the correct
memory area is used.

In order to fix this for blksize < 4k, increment dst correctly using the
blksize and only do it at the end of the loop.

Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20220505141733.1989450-4-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Stefan Haberland [Thu, 5 May 2022 14:17:30 +0000 (16:17 +0200)]

s390/dasd: prevent double format of tracks for ESE devices

For ESE devices we get an error for write operations on an unformatted
track. Afterwards the track will be formatted and the IO operation
restarted.
When using alias devices a track might be accessed by multiple requests
simultaneously and there is a race window that a track gets formatted
twice resulting in data loss.

Prevent this by remembering the amount of formatted tracks when starting
a request and comparing this number before actually formatting a track
on the fly. If the number has changed there is a chance that the current
track was finally formatted in between. As a result do not format the
track and restart the current IO to check.

The number of formatted tracks does not match the overall number of
formatted tracks on the device and it might wrap around but this is no
problem. It is only needed to recognize that a track has been formatted at
all in between.

Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes")
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Link: https://lore.kernel.org/r/20220505141733.1989450-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Stefan Haberland [Thu, 5 May 2022 14:17:29 +0000 (16:17 +0200)]

s390/dasd: fix data corruption for ESE devices

For ESE devices we get an error when accessing an unformatted track.
The handling of this error will return zero data for read requests and
format the track on demand before writing to it. To do this the code needs
to distinguish between read and write requests. This is done with data from
the blocklayer request. A pointer to the blocklayer request is stored in
the CQR.

If there is an error on the device an ERP request is built to do error
recovery. While the ERP request is mostly a copy of the original CQR the
pointer to the blocklayer request is not copied to not accidentally pass
it back to the blocklayer without cleanup.

This leads to the error that during ESE handling after an ERP request was
built it is not possible to determine the IO direction. This leads to the
formatting of a track for read requests which might in turn lead to data
corruption.

Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes")
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Link: https://lore.kernel.org/r/20220505141733.1989450-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Dave Airlie [Fri, 6 May 2022 01:17:59 +0000 (11:17 +1000)]

Merge tag 'drm-msm-fixes-2022-04-30' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

single lockdep fix.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGtkzqzxDLp82OaKXVrWd7nWZtkxKsuOK1wOGCDz7qF-dA@mail.gmail.com

commit | commitdiff | tree

Dave Airlie [Fri, 6 May 2022 00:56:20 +0000 (10:56 +1000)]

Merge tag 'drm-misc-fixes-2022-05-05' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v5.18-rc6:
- Small fix for hot-unplugging fb devices.
- Kconfig fix for it6505.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/69e51773-8c6f-4ff7-9a06-5c2922a43999@linux.intel.com

commit | commitdiff | tree

Dave Airlie [Thu, 5 May 2022 23:59:47 +0000 (09:59 +1000)]

Merge tag 'amd-drm-fixes-5.18-2022-05-04' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.18-2022-05-04:

amdgpu:
- Fix a xen dom0 regression on APUs
- Fix a potential array overflow if a receiver were to
send an erroneous audio channel count

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220504190439.5723-1-alexander.deucher@amd.com

commit | commitdiff | tree

Linus Torvalds [Thu, 5 May 2022 23:52:15 +0000 (16:52 -0700)]

Merge tag 'folio-5.18f' of git://git.infradead.org/users/willy/pagecache

Pull folio fixes from Matthew Wilcox:
"Two folio fixes for 5.18.

  Darrick and Brian have done amazing work debugging the race I created
  in the folio BIO iterator. The readahead problem was deterministic, so
  easy to fix.

   - Fix a race when we were calling folio_next() in the BIO folio iter
     without holding a reference, meaning the folio could be split or
     freed, and we'd jump to the next page instead of the intended next
     folio.

   - Fix readahead creating single-page folios instead of the intended
     large folios when doing reads that are not a power of two in size"

* tag 'folio-5.18f' of git://git.infradead.org/users/willy/pagecache:
  mm/readahead: Fix readahead with large folios
  block: Do not call folio_next() on an unreferenced folio

commit | commitdiff | tree

Linus Torvalds [Thu, 5 May 2022 22:50:27 +0000 (15:50 -0700)]

Merge tag 'devicetree-fixes-for-5.18-3' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Drop unused 'max-link-speed' in Apple PCIe

- More redundant 'maxItems/minItems' schema fixes

- Support values for pinctrl 'drive-push-pull' and 'drive-open-drain'

- Fix redundant 'unevaluatedProperties' in MT6360 LEDs binding

- Add missing 'power-domains' property to Cadence UFSHC

* tag 'devicetree-fixes-for-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: pci: apple,pcie: Drop max-link-speed from example
  dt-bindings: Drop redundant 'maxItems/minItems' in if/then schemas
  dt-bindings: pinctrl: Allow values for drive-push-pull and drive-open-drain
  dt-bindings: leds-mt6360: Drop redundant 'unevaluatedProperties'
  dt-bindings: ufs: cdns,ufshc: Add power-domains

commit | commitdiff | tree

David Sterba [Tue, 3 May 2022 15:35:25 +0000 (17:35 +0200)]

btrfs: sysfs: export the balance paused state of exclusive operation

The new state allowing device addition with paused balance is not
exported to user space so it can't recognize it and actually start the
operation.

Fixes: efc0e69c2fea ("btrfs: introduce exclusive operation BALANCE_PAUSED state")
CC: stable@vger.kernel.org # 5.17
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Tue, 3 May 2022 10:57:02 +0000 (11:57 +0100)]

btrfs: fix assertion failure when logging directory key range item

When inserting a key range item (BTRFS_DIR_LOG_INDEX_KEY) while logging
a directory, we don't expect the insertion to fail with -EEXIST, because
we are holding the directory's log_mutex and we have dropped all existing
BTRFS_DIR_LOG_INDEX_KEY keys from the log tree before we started to log
the directory. However it's possible that during the logging we attempt
to insert the same BTRFS_DIR_LOG_INDEX_KEY key twice, but for this to
happen we need to race with insertions of items from other inodes in the
subvolume's tree while we are logging a directory. Here's how this can
happen:

1) We are logging a directory with inode number 1000 that has its items
   spread across 3 leaves in the subvolume's tree:

   leaf A - has index keys from the range 2 to 20 for example. The last
   item in the leaf corresponds to a dir item for index number 20. All
   these dir items were created in a past transaction.

   leaf B - has index keys from the range 22 to 100 for example. It has
   no keys from other inodes, all its keys are dir index keys for our
   directory inode number 1000. Its first key is for the dir item with
   a sequence number of 22. All these dir items were also created in a
   past transaction.

   leaf C - has index keys for our directory for the range 101 to 120 for
   example. This leaf also has items from other inodes, and its first
   item corresponds to the dir item for index number 101 for our directory
   with inode number 1000;

2) When we finish processing the items from leaf A at log_dir_items(),
   we log a BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and a last
   offset of 21, meaning the log is authoritative for the index range
   from 21 to 21 (a single sequence number). At this point leaf B was
   not yet modified in the current transaction;

3) When we return from log_dir_items() we have released our read lock on
   leaf B, and have set *last_offset_ret to 21 (index number of the first
   item on leaf B minus 1);

4) Some other task inserts an item for other inode (inode number 1001 for
   example) into leaf C. That resulted in pushing some items from leaf C
   into leaf B, in order to make room for the new item, so now leaf B
   has dir index keys for the sequence number range from 22 to 102 and
   leaf C has the dir items for the sequence number range 103 to 120;

5) At log_directory_changes() we call log_dir_items() again, passing it
   a 'min_offset' / 'min_key' value of 22 (*last_offset_ret from step 3
   plus 1, so 21 + 1). Then btrfs_search_forward() leaves us at slot 0
   of leaf B, since leaf B was modified in the current transaction.

   We have also initialized 'last_old_dentry_offset' to 20 after calling
   btrfs_previous_item() at log_dir_items(), as it left us at the last
   item of leaf A, which refers to the dir item with sequence number 20;

6) We then call process_dir_items_leaf() to process the dir items of
   leaf B, and when we process the first item, corresponding to slot 0,
   sequence number 22, we notice the dir item was created in a past
   transaction and its sequence number is greater than the value of
   *last_old_dentry_offset + 1 (20 + 1), so we decide to log again a
   BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and an end range
   of 21 (key.offset - 1 == 22 - 1 == 21), which results in an -EEXIST
   error from insert_dir_log_key(), as we have already inserted that
   key at step 2, triggering the assertion at process_dir_items_leaf().

The trace produced in dmesg is like the following:

assertion failed: ret != -EEXIST, in fs/btrfs/tree-log.c:3857
[198255.980839][ T7460] ------------[ cut here ]------------
[198255.981666][ T7460] kernel BUG at fs/btrfs/ctree.h:3617!
[198255.983141][ T7460] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[198255.984080][ T7460] CPU: 0 PID: 7460 Comm: repro-ghost-dir Not tainted 5.18.0-5314c78ac373-misc-next+
[198255.986027][ T7460] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[198255.988600][ T7460] RIP: 0010:assertfail.constprop.0+0x1c/0x1e
[198255.989465][ T7460] Code: 8b 4c 89 (...)
[198255.992599][ T7460] RSP: 0018:ffffc90007387188 EFLAGS: 00010282
[198255.993414][ T7460] RAX: 000000000000003d RBX: 0000000000000065 RCX: 0000000000000000
[198255.996056][ T7460] RDX: 0000000000000001 RSI: ffffffff8b62b180 RDI: fffff52000e70e24
[198255.997668][ T7460] RBP: ffffc90007387188 R08: 000000000000003d R09: ffff8881f0e16507
[198255.999199][ T7460] R10: ffffed103e1c2ca0 R11: 0000000000000001 R12: 00000000ffffffef
[198256.000683][ T7460] R13: ffff88813befc630 R14: ffff888116c16e70 R15: ffffc90007387358
[198256.007082][ T7460] FS:  00007fc7f7c24640(0000) GS:ffff8881f0c00000(0000) knlGS:0000000000000000
[198256.009939][ T7460] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[198256.014133][ T7460] CR2: 0000560bb16d0b78 CR3: 0000000140b34005 CR4: 0000000000170ef0
[198256.015239][ T7460] Call Trace:
[198256.015674][ T7460]  <TASK>
[198256.016313][ T7460]  log_dir_items.cold+0x16/0x2c
[198256.018858][ T7460]  ? replay_one_extent+0xbf0/0xbf0
[198256.025932][ T7460]  ? release_extent_buffer+0x1d2/0x270
[198256.029658][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.031114][ T7460]  ? lock_acquired+0xbe/0x660
[198256.032633][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.034386][ T7460]  ? lock_release+0xcf/0x8a0
[198256.036152][ T7460]  log_directory_changes+0xf9/0x170
[198256.036993][ T7460]  ? log_dir_items+0xba0/0xba0
[198256.037661][ T7460]  ? do_raw_write_unlock+0x7d/0xe0
[198256.038680][ T7460]  btrfs_log_inode+0x233b/0x26d0
[198256.041294][ T7460]  ? log_directory_changes+0x170/0x170
[198256.042864][ T7460]  ? btrfs_attach_transaction_barrier+0x60/0x60
[198256.045130][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.046568][ T7460]  ? lock_release+0xcf/0x8a0
[198256.047504][ T7460]  ? lock_downgrade+0x420/0x420
[198256.048712][ T7460]  ? ilookup5_nowait+0x81/0xa0
[198256.049747][ T7460]  ? lock_downgrade+0x420/0x420
[198256.050652][ T7460]  ? do_raw_spin_unlock+0xa9/0x100
[198256.051618][ T7460]  ? __might_resched+0x128/0x1c0
[198256.052511][ T7460]  ? __might_sleep+0x66/0xc0
[198256.053442][ T7460]  ? __kasan_check_read+0x11/0x20
[198256.054251][ T7460]  ? iget5_locked+0xbd/0x150
[198256.054986][ T7460]  ? run_delayed_iput_locked+0x110/0x110
[198256.055929][ T7460]  ? btrfs_iget+0xc7/0x150
[198256.056630][ T7460]  ? btrfs_orphan_cleanup+0x4a0/0x4a0
[198256.057502][ T7460]  ? free_extent_buffer+0x13/0x20
[198256.058322][ T7460]  btrfs_log_inode+0x2654/0x26d0
[198256.059137][ T7460]  ? log_directory_changes+0x170/0x170
[198256.060020][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.060930][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.061905][ T7460]  ? lock_contended+0x770/0x770
[198256.062682][ T7460]  ? btrfs_log_inode_parent+0xd04/0x1750
[198256.063582][ T7460]  ? lock_downgrade+0x420/0x420
[198256.064432][ T7460]  ? preempt_count_sub+0x18/0xc0
[198256.065550][ T7460]  ? __mutex_lock+0x580/0xdc0
[198256.066654][ T7460]  ? stack_trace_save+0x94/0xc0
[198256.068008][ T7460]  ? __kasan_check_write+0x14/0x20
[198256.072149][ T7460]  ? __mutex_unlock_slowpath+0x12a/0x430
[198256.073145][ T7460]  ? mutex_lock_io_nested+0xcd0/0xcd0
[198256.074341][ T7460]  ? wait_for_completion_io_timeout+0x20/0x20
[198256.075345][ T7460]  ? lock_downgrade+0x420/0x420
[198256.076142][ T7460]  ? lock_contended+0x770/0x770
[198256.076939][ T7460]  ? do_raw_spin_lock+0x1c0/0x1c0
[198256.078401][ T7460]  ? btrfs_sync_file+0x5e6/0xa40
[198256.080598][ T7460]  btrfs_log_inode_parent+0x523/0x1750
[198256.081991][ T7460]  ? wait_current_trans+0xc8/0x240
[198256.083320][ T7460]  ? lock_downgrade+0x420/0x420
[198256.085450][ T7460]  ? btrfs_end_log_trans+0x70/0x70
[198256.086362][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.087544][ T7460]  ? lock_release+0xcf/0x8a0
[198256.088305][ T7460]  ? lock_downgrade+0x420/0x420
[198256.090375][ T7460]  ? dget_parent+0x8e/0x300
[198256.093538][ T7460]  ? do_raw_spin_lock+0x1c0/0x1c0
[198256.094918][ T7460]  ? lock_downgrade+0x420/0x420
[198256.097815][ T7460]  ? do_raw_spin_unlock+0xa9/0x100
[198256.101822][ T7460]  ? dget_parent+0xb7/0x300
[198256.103345][ T7460]  btrfs_log_dentry_safe+0x48/0x60
[198256.105052][ T7460]  btrfs_sync_file+0x629/0xa40
[198256.106829][ T7460]  ? start_ordered_ops.constprop.0+0x120/0x120
[198256.109655][ T7460]  ? __fget_files+0x161/0x230
[198256.110760][ T7460]  vfs_fsync_range+0x6d/0x110
[198256.111923][ T7460]  ? start_ordered_ops.constprop.0+0x120/0x120
[198256.113556][ T7460]  __x64_sys_fsync+0x45/0x70
[198256.114323][ T7460]  do_syscall_64+0x5c/0xc0
[198256.115084][ T7460]  ? syscall_exit_to_user_mode+0x3b/0x50
[198256.116030][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.116768][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.117555][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.118324][ T7460]  ? sysvec_call_function_single+0x57/0xc0
[198256.119308][ T7460]  ? asm_sysvec_call_function_single+0xa/0x20
[198256.120363][ T7460]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[198256.121334][ T7460] RIP: 0033:0x7fc7fe97b6ab
[198256.122067][ T7460] Code: 0f 05 48 (...)
[198256.125198][ T7460] RSP: 002b:00007fc7f7c23950 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[198256.126568][ T7460] RAX: ffffffffffffffda RBX: 00007fc7f7c239f0 RCX: 00007fc7fe97b6ab
[198256.127942][ T7460] RDX: 0000000000000002 RSI: 000056167536bcf0 RDI: 0000000000000004
[198256.129302][ T7460] RBP: 0000000000000004 R08: 0000000000000000 R09: 000000007ffffeb8
[198256.130670][ T7460] R10: 00000000000001ff R11: 0000000000000293 R12: 0000000000000001
[198256.132046][ T7460] R13: 0000561674ca8140 R14: 00007fc7f7c239d0 R15: 000056167536dab8
[198256.133403][ T7460]  </TASK>

Fix this by treating -EEXIST as expected at insert_dir_log_key() and have
it update the item with an end offset corresponding to the maximum between
the previously logged end offset and the new requested end offset. The end
offsets may be different due to dir index key deletions that happened as
part of unlink operations while we are logging a directory (triggered when
fsyncing some other inode parented by the directory) or during renames
which always attempt to log a single dir index deletion.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/YmyefE9mc2xl5ZMz@hungrycats.org/
Fixes: 732d591a5d6c12 ("btrfs: stop copying old dir items when logging a directory")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Naohiro Aota [Tue, 3 May 2022 21:10:05 +0000 (14:10 -0700)]

btrfs: zoned: activate block group properly on unlimited active zone device

btrfs_zone_activate() checks if it activated all the underlying zones in
the loop. However, that check never hit on an unlimited activate zone
device (max_active_zones == 0).

Fortunately, it still works without ENOSPC because btrfs_zone_activate()
returns true in the end, even if block_group->zone_is_active == 0. But, it
is confusing to have non zone_is_active block group still usable for
allocation. Also, we are wasting CPU time to iterate the loop every time
btrfs_zone_activate() is called for the blog groups.

Since error case in the loop is handled by out_unlock, we can just set
zone_is_active and do the list stuff after the loop.

Fixes: f9a912a3c45f ("btrfs: zoned: make zone activation multi stripe capable")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Naohiro Aota [Tue, 3 May 2022 21:10:04 +0000 (14:10 -0700)]

btrfs: zoned: move non-changing condition check out of the loop

btrfs_zone_activate() checks if block_group->alloc_offset ==
block_group->zone_capacity every time it iterates the loop. But, it is
not depending on the index. Move out the check and do it only once.

Fixes: f9a912a3c45f ("btrfs: zoned: make zone activation multi stripe capable")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Qu Wenruo [Fri, 1 Apr 2022 07:29:37 +0000 (15:29 +0800)]

btrfs: force v2 space cache usage for subpage mount

[BUG]
For a 4K sector sized btrfs with v1 cache enabled and only mounted on
systems with 4K page size, if it's mounted on subpage (64K page size)
systems, it can cause the following warning on v1 space cache:

BTRFS error (device dm-1): csum mismatch on free space cache
BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now

Although not a big deal, as kernel can rebuild it without problem, such
warning will bother end users, especially if they want to switch the
same btrfs seamlessly between different page sized systems.

[CAUSE]
V1 free space cache is still using fixed PAGE_SIZE for various bitmap,
like BITS_PER_BITMAP.

Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1
cache size to checksum.

Thus kernel will always reject v1 cache with a different PAGE_SIZE with
csum mismatch.

[FIX]
Although we should fix v1 cache, it's already going to be marked
deprecated soon.

And we have v2 cache based on metadata (which is already fully subpage
compatible), and it has almost everything superior than v1 cache.

So just force subpage mount to use v2 cache on mount.

Reported-by: Matt Corallo <blnxfsl@bluematt.me>
CC: stable@vger.kernel.org # 5.15+
Link: https://lore.kernel.org/linux-btrfs/61aa27d1-30fc-c1a9-f0f4-9df544395ec3@bluematt.me/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Linus Torvalds [Thu, 5 May 2022 17:38:11 +0000 (10:38 -0700)]

Merge tag 's390-5.18-4' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

- Disable -Warray-bounds warning for gcc12, since the only known way to
   workaround false positive warnings on lowcore accesses would result
   in worse code on fast paths.

- Avoid lockdep_assert_held() warning in kvm vm memop code.

- Reduce overhead within gmap_rmap code to get rid of long latencies
   when e.g. shutting down 2nd level guests.

* tag 's390-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  KVM: s390: vsie/gmap: reduce gmap_rmap overhead
  KVM: s390: Fix lockdep issue in vm memop
  s390: disable -Warray-bounds

commit | commitdiff | tree

Linus Torvalds [Thu, 5 May 2022 17:27:30 +0000 (10:27 -0700)]

Merge tag 'mips-fixes_5.18_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:
"Extend R4000/R4400 CPU erratum workaround to all revisions"

* tag 'mips-fixes_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Fix CP0 counter erratum detection for R4k CPUs

commit | commitdiff | tree

Linus Torvalds [Thu, 5 May 2022 16:45:12 +0000 (09:45 -0700)]

Merge tag 'net-5.18-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from can, rxrpc and wireguard.

  Previous releases - regressions:

   - igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()

   - mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()

   - rds: acquire netns refcount on TCP sockets

   - rxrpc: enable IPv6 checksums on transport socket

   - nic: hinic: fix bug of wq out of bound access

   - nic: thunder: don't use pci_irq_vector() in atomic context

   - nic: bnxt_en: fix possible bnxt_open() failure caused by wrong RFS
     flag

   - nic: mlx5e:
      - lag, fix use-after-free in fib event handler
      - fix deadlock in sync reset flow

  Previous releases - always broken:

   - tcp: fix insufficient TCP source port randomness

   - can: grcan: grcan_close(): fix deadlock

   - nfc: reorder destructive operations in to avoid bugs

  Misc:

   - wireguard: improve selftests reliability"

* tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  NFC: netlink: fix sleep in atomic bug when firmware download timeout
  selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
  tcp: drop the hash_32() part from the index calculation
  tcp: increase source port perturb table to 2^16
  tcp: dynamically allocate the perturb table used by source ports
  tcp: add small random increments to the source port
  tcp: resalt the secret every 10 seconds
  tcp: use different parts of the port_offset for index and offset
  secure_seq: use the 64 bits of the siphash for port offset calculation
  wireguard: selftests: set panic_on_warn=1 from cmdline
  wireguard: selftests: bump package deps
  wireguard: selftests: restore support for ccache
  wireguard: selftests: use newer toolchains to fill out architectures
  wireguard: selftests: limit parallelism to $(nproc) tests at once
  wireguard: selftests: make routing loop test non-fatal
  net/mlx5: Fix matching on inner TTC
  net/mlx5: Avoid double clear or set of sync reset requested
  net/mlx5: Fix deadlock in sync reset flow
  net/mlx5e: Fix trust state reset in reload
  net/mlx5e: Avoid checking offload capability in post_parse action
  ...

commit | commitdiff | tree

Nobuhiro Iwamatsu [Thu, 21 Apr 2022 09:42:28 +0000 (18:42 +0900)]

gpio: visconti: Fix fwnode of GPIO IRQ

The fwnode of GPIO IRQ must be set to its own fwnode, not the fwnode of the
parent IRQ. Therefore, this sets own fwnode instead of the parent IRQ fwnode to
GPIO IRQ's.

Fixes: 2ad74f40dacc ("gpio: visconti: Add Toshiba Visconti GPIO support")
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>

commit | commitdiff | tree

Thomas Pfaff [Mon, 2 May 2022 11:28:29 +0000 (13:28 +0200)]

genirq: Synchronize interrupt thread startup

A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:

   setserial
     open("/dev/ttyXXX")
       request_irq()
     do_stuff()
      -> serial interrupt
         -> wake(irq_thread)
      desc->threads_active++;
     close()
       free_irq()
         kthread_stop(irq_thread)
     synchronize_irq() <- hangs because desc->threads_active != 0

The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.

This problem was introduced with commit 519cc8652b3a, which addressed a
interrupt sharing issue in the PCIe code.

Before that commit free_irq() invoked synchronize_irq(), which waits for
the hard interrupt handler and also for associated threads to complete.

To address the PCIe issue synchronize_irq() was replaced with
__synchronize_hardirq(), which only waits for the hard interrupt handler to
complete, but not for threaded handlers.

This was done under the assumption, that the interrupt thread already
reached the thread function and waits for a wake-up, which is guaranteed to
be handled before acting on the stop condition. The problematic case, that
the thread would not reach the thread function, was obviously overlooked.

Make sure that the interrupt thread is really started and reaches
thread_fn() before returning from __setup_irq().

This utilizes the existing wait queue in the interrupt descriptor. The
wait queue is unused for non-shared interrupts. For shared interrupts the
usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
completion of a threaded handler might cause a spurious wake-up of the
waiter for the ready flag. Both are harmless and have no functional impact.

[ tglx: Amended changelog ]

Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com

commit | commitdiff | tree

Bartosz Golaszewski [Mon, 2 May 2022 09:34:16 +0000 (11:34 +0200)]

MAINTAINERS: update the GPIO git tree entry

My git tree has become the de facto main GPIO tree. Update the
MAINTAINERS file to reflect that.

Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Reported-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

commit | commitdiff | tree

Duoming Zhou [Wed, 4 May 2022 05:58:47 +0000 (13:58 +0800)]

NFC: netlink: fix sleep in atomic bug when firmware download timeout

There are sleep in atomic bug that could cause kernel panic during
firmware download process. The root cause is that nlmsg_new with
GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer
handler. The call trace is shown below:

BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265
Call Trace:
kmem_cache_alloc_node
__alloc_skb
nfc_genl_fw_download_done
call_timer_fn
__run_timers.part.0
run_timer_softirq
__do_softirq
...

The nlmsg_new with GFP_KERNEL parameter may sleep during memory
allocation process, and the timer handler is run as the result of
a "software interrupt" that should not call any other function
that could sleep.

This patch changes allocation mode of netlink message from GFP_KERNEL
to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC
flag makes memory allocation operation could be used in atomic context.

Fixes: 9674da8759df ("NFC: Add firmware upload netlink command")
Fixes: 9ea7187c53f6 ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Wed, 27 Apr 2022 21:01:28 +0000 (17:01 -0400)]

mm/readahead: Fix readahead with large folios

Reading 100KB chunks from a big file (eg dd bs=100K) leads to poor
readahead behaviour.  Studying the traces in detail, I noticed two
problems.

The first is that we were setting the readahead flag on the folio which
contains the last byte read from the block.  This is wrong because we
will trigger readahead at the end of the read without waiting to see
if a subsequent read is going to use the pages we just read.  Instead,
we need to set the readahead flag on the first folio _after_ the one
which contains the last byte that we're reading.

The second is that we were looking for the index of the folio with the
readahead flag set to exactly match the start + size - async_size.
If we've rounded this, either down (as previously) or up (as now),
we'll think we hit a folio marked as readahead by a different read,
and try to read the wrong pages.  So round the expected index to the
order of the folio we hit.

Reported-by: Guo Xuenan <guoxuenan@huawei.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Tue, 3 May 2022 04:09:31 +0000 (00:09 -0400)]

block: Do not call folio_next() on an unreferenced folio

It is unsafe to call folio_next() on a folio unless you hold a reference
on it that prevents it from being split or freed.  After returning
from the iterator, iomap calls folio_end_writeback() which may drop
the last reference to the page, or allow the page to be split.  If that
happens, the iterator will not advance far enough through the bio_vec,
leading to assertion failures like the BUG() in folio_end_writeback()
that checks we're not trying to end writeback on a page not currently
under writeback.  Other assertion failures were also seen, but they're
all explained by this one bug.

Fix the bug by remembering where the next folio starts before returning
from the iterator.  There are other ways of fixing this bug, but this
seems the simplest.

Reported-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

commit | commitdiff | tree

Vladimir Oltean [Tue, 3 May 2022 12:14:28 +0000 (15:14 +0300)]

selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer

As discussed here with Ido Schimmel:
https://patchwork.kernel.org/project/netdevbpf/patch/20220224102908.5255-2-jianbol@nvidia.com/

the default conform-exceed action is "reclassify", for a reason we don't
really understand.

The point is that hardware can't offload that police action, so not
specifying "conform-exceed" was always wrong, even though the command
used to work in hardware (but not in software) until the kernel started
adding validation for it.

Fix the command used by the selftest by making the policer drop on
exceed, and pass the packet to the next action (goto) on conform.

Fixes: 8cd6b020b644 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220503121428.842906-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 5 May 2022 02:22:34 +0000 (19:22 -0700)]

Merge branch 'insufficient-tcp-source-port-randomness'

Willy Tarreau says:

====================
insufficient TCP source port randomness

In a not-yet published paper, Moshe Kol, Amit Klein, and Yossi Gilad
report being able to accurately identify a client by forcing it to emit
only 40 times more connections than the number of entries in the
table_perturb[] table, which is indexed by hashing the connection tuple.
The current 2^8 setting allows them to perform that attack with only 10k
connections, which is not hard to achieve in a few seconds.

Eric, Amit and I have been working on this for a few weeks now imagining,
testing and eliminating a number of approaches that Amit and his team were
still able to break or that were found to be too risky or too expensive,
and ended up with the simple improvements in this series that resists to
the attack, doesn't degrade the performance, and preserves a reliable port
selection algorithm to avoid connection failures, including the odd/even
port selection preference that allows bind() to always find a port quickly
even under strong connect() stress.

The approach relies on several factors:
  - resalting the hash secret that's used to choose the table_perturb[]
    entry every 10 seconds to eliminate slow attacks and force the
    attacker to forget everything that was learned after this delay.
    This already eliminates most of the problem because if a client
    stays silent for more than 10 seconds there's no link between the
    previous and the next patterns, and 10s isn't yet frequent enough
    to cause too frequent repetition of a same port that may induce a
    connection failure ;

  - adding small random increments to the source port. Previously, a
    random 0 or 1 was added every 16 ports. Now a random 0 to 7 is
    added after each port. This means that with the default 32768-60999
    range, a worst case rollover happens after 1764 connections, and
    an average of 3137. This doesn't stop statistical attacks but
    requires significantly more iterations of the same attack to
    confirm a guess.

  - increasing the table_perturb[] size from 2^8 to 2^16, which Amit
    says will require 2.6 million connections to be attacked with the
    changes above, making it pointless to get a fingerprint that will
    only last 10 seconds. Due to the size, the table was made dynamic.

  - a few minor improvements on the bits used from the hash, to eliminate
    some unfortunate correlations that may possibly have been exploited
    to design future attack models.

These changes were tested under the most extreme conditions, up to
1.1 million connections per second to one and a few targets, showing no
performance regression, and only 2 connection failures within 13 billion,
which is less than 2^-32 and perfectly within usual values.

The series is split into small reviewable changes and was already reviewed
by Amit and Eric.
====================

Link: https://lore.kernel.org/r/20220502084614.24123-1-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Willy Tarreau [Mon, 2 May 2022 08:46:14 +0000 (10:46 +0200)]

tcp: drop the hash_32() part from the index calculation

In commit 190cc82489f4 ("tcp: change source port randomizarion at
connect() time"), the table_perturb[] array was introduced and an
index was taken from the port_offset via hash_32(). But it turns
out that hash_32() performs a multiplication while the input here
comes from the output of SipHash in secure_seq, that is well
distributed enough to avoid the need for yet another hash.

Suggested-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Willy Tarreau [Mon, 2 May 2022 08:46:13 +0000 (10:46 +0200)]

tcp: increase source port perturb table to 2^16

Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
identify a client by forcing it to emit only 40 times more connections
than there are entries in the table_perturb[] table. The previous two
improvements consisting in resalting the secret every 10s and adding
randomness to each port selection only slightly improved the situation,
and the current value of 2^8 was too small as it's not very difficult
to make a client emit 10k connections in less than 10 seconds.

Thus we're increasing the perturb table from 2^8 to 2^16 so that the
same precision now requires 2.6M connections, which is more difficult in
this time frame and harder to hide as a background activity. The impact
is that the table now uses 256 kB instead of 1 kB, which could mostly
affect devices making frequent outgoing connections. However such
components usually target a small set of destinations (load balancers,
database clients, perf assessment tools), and in practice only a few
entries will be visited, like before.

A live test at 1 million connections per second showed no performance
difference from the previous value.

Reported-by: Moshe Kol <moshe.kol@mail.huji.ac.il>
Reported-by: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Willy Tarreau [Mon, 2 May 2022 08:46:12 +0000 (10:46 +0200)]

tcp: dynamically allocate the perturb table used by source ports

We'll need to further increase the size of this table and it's likely
that at some point its size will not be suitable anymore for a static
table. Let's allocate it on boot from inet_hashinfo2_init(), which is
called from tcp_init().

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Willy Tarreau [Mon, 2 May 2022 08:46:11 +0000 (10:46 +0200)]

tcp: add small random increments to the source port

Here we're randomly adding between 0 and 7 random increments to the
selected source port in order to add some noise in the source port
selection that will make the next port less predictable.

With the default port range of 32768-60999 this means a worst case
reuse scenario of 14116/8=1764 connections between two consecutive
uses of the same port, with an average of 14116/4.5=3137. This code
was stressed at more than 800000 connections per second to a fixed
target with all connections closed by the client using RSTs (worst
condition) and only 2 connections failed among 13 billion, despite
the hash being reseeded every 10 seconds, indicating a perfectly
safe situation.

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 2 May 2022 08:46:10 +0000 (10:46 +0200)]

tcp: resalt the secret every 10 seconds

In order to limit the ability for an observer to recognize the source
ports sequence used to contact a set of destinations, we should
periodically shuffle the secret. 10 seconds looks effective enough
without causing particular issues.

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Willy Tarreau [Mon, 2 May 2022 08:46:09 +0000 (10:46 +0200)]

tcp: use different parts of the port_offset for index and offset

Amit Klein suggests that we use different parts of port_offset for the
table's index and the port offset so that there is no direct relation
between them.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Willy Tarreau [Mon, 2 May 2022 08:46:08 +0000 (10:46 +0200)]

secure_seq: use the 64 bits of the siphash for port offset calculation

SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
7cd23e5300c1 ("secure_seq: use SipHash in place of MD5"), but the output
remained truncated to 32-bit only. In order to exploit more bits from the
hash, let's make the functions return the full 64-bit of siphash_3u32().
We also make sure the port offset calculation in __inet_hash_connect()
remains done on 32-bit to avoid the need for div_u64_rem() and an extra
cost on 32-bit systems.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 5 May 2022 00:50:00 +0000 (17:50 -0700)]

Merge branch 'wireguard-patches-for-5-18-rc6'

Jason A. Donenfeld says:

====================
wireguard patches for 5.18-rc6

In working on some other problems, I wound up leaning on the WireGuard
CI more than usual and uncovered a few small issues with reliability.
These are fairly low key changes, since they don't impact kernel code
itself.

One change does stick out in particular, though, which is the "make
routing loop test non-fatal" commit. I'm not thrilled about doing this,
but currently [1] remains unsolved, and I'm still working on a real
solution to that (hopefully for 5.19 or 5.20 if I can come up with a
good idea...), so for now that test just prints a big red warning
instead.

[1] https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/
====================

Link: https://lore.kernel.org/r/20220504202920.72908-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jason A. Donenfeld [Wed, 4 May 2022 20:29:20 +0000 (22:29 +0200)]

wireguard: selftests: set panic_on_warn=1 from cmdline

Rather than setting this once init is running, set panic_on_warn from
the kernel command line, so that it catches splats from WireGuard
initialization code and the various crypto selftests.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jason A. Donenfeld [Wed, 4 May 2022 20:29:19 +0000 (22:29 +0200)]

wireguard: selftests: bump package deps

Use newer, more reliable package dependencies. These should hopefully
reduce flakes. However, we keep the old iputils package, as it
accumulated bugs after resulting in flakes on slow machines.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jason A. Donenfeld [Wed, 4 May 2022 20:29:18 +0000 (22:29 +0200)]

wireguard: selftests: restore support for ccache

When moving to non-system toolchains, we inadvertantly killed the
ability to use ccache. So instead, build ccache support into the test
harness directly.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jason A. Donenfeld [Wed, 4 May 2022 20:29:17 +0000 (22:29 +0200)]

wireguard: selftests: use newer toolchains to fill out architectures

Rather than relying on the system to have cross toolchains available,
simply download musl.cc's ones and use that libc.so, and then we use it
to fill in a few missing platforms, such as riscv64, riscv64, powerpc64,
and s390x.

Since riscv doesn't have a second serial port in its device description,
we have to use virtio's vport. This is actually the same situation on
ARM, but we were previously hacking QEMU up to work around this, which
required a custom QEMU. Instead just do the vport trick on ARM too.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jason A. Donenfeld [Wed, 4 May 2022 20:29:16 +0000 (22:29 +0200)]

wireguard: selftests: limit parallelism to $(nproc) tests at once

The parallel tests were added to catch queueing issues from multiple
cores. But what happens in reality when testing tons of processes is
that these separate threads wind up fighting with the scheduler, and we
wind up with contention in places we don't care about that decrease the
chances of hitting a bug. So just do a test with the number of CPU
cores, rather than trying to scale up arbitrarily.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Domain: System / Kernel;

RSS Atom