platform/kernel/linux-rpi.git
2 years agoPCI: mvebu: Fix device enumeration regression
Pali Rohár [Mon, 14 Feb 2022 11:02:28 +0000 (12:02 +0100)]
PCI: mvebu: Fix device enumeration regression

[ Upstream commit c49ae619905eebd3f54598a84e4cd2bd58ba8fe9 ]

Jan reported that on Turris Omnia (Armada 385), no PCIe devices were
detected after upgrading from v5.16.1 to v5.16.3 and identified the cause
as the backport of 91a8d79fc797 ("PCI: mvebu: Fix configuring secondary bus
of PCIe Root Port via emulated bridge"), which appeared in v5.17-rc1.

91a8d79fc797 was incorrectly applied from mailing list patch [1] to the
linux git repository [2] probably due to resolving merge conflicts
incorrectly. Fix it now.

[1] https://lore.kernel.org/r/20211125124605.25915-12-pali@kernel.org
[2] https://git.kernel.org/linus/91a8d79fc797

[bhelgaas: commit log]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215540
Fixes: 91a8d79fc797 ("PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge")
Link: https://lore.kernel.org/r/20220214110228.25825-1-pali@kernel.org
Link: https://lore.kernel.org/r/20220127234917.GA150851@bhelgaas
Reported-by: Jan Palus <jpalus@fastmail.com>
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amd/display: For vblank_disable_immediate, check PSR is really used
Michel Dänzer [Tue, 15 Feb 2022 18:53:37 +0000 (19:53 +0100)]
drm/amd/display: For vblank_disable_immediate, check PSR is really used

[ Upstream commit 4d22336f903930eb94588b939c310743a3640276 ]

Even if PSR is allowed for a present GPU, there might be no eDP link
which supports PSR.

Fixes: 708978487304 ("drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agobnxt_en: Fix occasional ethtool -t loopback test failures
Michael Chan [Sun, 20 Feb 2022 09:05:49 +0000 (04:05 -0500)]
bnxt_en: Fix occasional ethtool -t loopback test failures

[ Upstream commit cfcab3b3b61584a02bb523ffa99564eafa761dfe ]

In the current code, we setup the port to PHY or MAC loopback mode
and then transmit a test broadcast packet for the loopback test.  This
scheme fails sometime if the port is shared with management firmware
that can also send packets.  The driver may receive the management
firmware's packet and the test will fail when the contents don't
match the test packet.

Change the test packet to use it's own MAC address as the destination
and setup the port to only receive it's own MAC address.  This should
filter out other packets sent by management firmware.

Fixes: 91725d89b97a ("bnxt_en: Add PHY loopback to ethtool self-test.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amd/display: Fix stream->link_enc unassigned during stream removal
Nicholas Kazlauskas [Tue, 25 Jan 2022 17:04:34 +0000 (12:04 -0500)]
drm/amd/display: Fix stream->link_enc unassigned during stream removal

[ Upstream commit 3743e7f6fcb938b7d8b7967e6a9442805e269b3d ]

[Why]
Found when running igt@kms_atomic.

Userspace attempts to do a TEST_COMMIT when 0 streams which calls
dc_remove_stream_from_ctx. This in turn calls link_enc_unassign
which ends up modifying stream->link = NULL directly, causing the
global link_enc to be removed preventing further link activity
and future link validation from passing.

[How]
We take care of link_enc unassignment at the start of
link_enc_cfg_link_encs_assign so this call is no longer necessary.

Fixes global state from being modified while unlocked.

Reviewed-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Acked-by: Jasdeep Dhillon <jdhillon@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agocifs: fix confusing unneeded warning message on smb2.1 and earlier
Steve French [Wed, 16 Feb 2022 19:23:53 +0000 (13:23 -0600)]
cifs: fix confusing unneeded warning message on smb2.1 and earlier

[ Upstream commit 53923e0fe2098f90f339510aeaa0e1413ae99a16 ]

When mounting with SMB2.1 or earlier, even with nomultichannel, we
log the confusing warning message:
  "CIFS: VFS: multichannel is not supported on this protocol version, use 3.0 or above"

Fix this so that we don't log this unless they really are trying
to mount with multichannel.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215608
Reported-by: Kim Scarborough <kim@scarborough.kim>
Cc: stable@vger.kernel.org # 5.11+
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agocifs: protect session channel fields with chan_lock
Shyam Prasad N [Mon, 19 Jul 2021 10:54:46 +0000 (10:54 +0000)]
cifs: protect session channel fields with chan_lock

[ Upstream commit 724244cdb3828522109c88e56a0242537aefabe9 ]

Introducing a new spin lock to protect all the channel related
fields in a cifs_ses struct. This lock should be taken
whenever dealing with the channel fields, and should be held
only for very short intervals which will not sleep.

Currently, all channel related fields in cifs_ses structure
are protected by session_mutex. However, this mutex is held for
long periods (sometimes while waiting for a reply from server).
This makes the codepath quite tricky to change.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/mediatek: mtk_dsi: Reset the dsi0 hardware
Enric Balletbo i Serra [Thu, 30 Sep 2021 08:31:50 +0000 (10:31 +0200)]
drm/mediatek: mtk_dsi: Reset the dsi0 hardware

[ Upstream commit 605c83753d97946aab176735020a33ebfb0b4615 ]

Reset dsi0 HW to default when power on. This prevents to have different
settingis between the bootloader and the kernel.

As not all Mediatek boards have the reset consumer configured in their
board description, also is not needed on all of them, the reset is optional,
so the change is compatible with all boards.

Cc: Jitao Shi <jitao.shi@mediatek.com>
Suggested-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Acked-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20210930103105.v4.7.Idbb4727ddf00ba2fe796b630906baff10d994d89@changeid
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agonet: ethernet: litex: Add the dependency on HAS_IOMEM
Cai Huoqing [Tue, 8 Feb 2022 01:33:08 +0000 (09:33 +0800)]
net: ethernet: litex: Add the dependency on HAS_IOMEM

[ Upstream commit 2427f03fb42f9dc14c53108f2c9b5563eb37e770 ]

The LiteX driver uses devm io function API which
needs HAS_IOMEM enabled, so add the dependency on HAS_IOMEM.

Fixes: ee7da21ac4c3 ("net: Add driver for LiteX's LiteETH network interface")
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Link: https://lore.kernel.org/r/20220208013308.6563-1-cai.huoqing@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoof: net: move of_net under net/
Jakub Kicinski [Thu, 7 Oct 2021 01:06:54 +0000 (18:06 -0700)]
of: net: move of_net under net/

[ Upstream commit e330fb14590c5c80f7195c3d8c9b4bcf79e1a5cd ]

Rob suggests to move of_net.c from under drivers/of/ somewhere
to the networking code.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoibmvnic: don't release napi in __ibmvnic_open()
Sukadev Bhattiprolu [Tue, 8 Feb 2022 00:19:18 +0000 (16:19 -0800)]
ibmvnic: don't release napi in __ibmvnic_open()

[ Upstream commit 61772b0908c640d0309c40f7d41d062ca4e979fa ]

If __ibmvnic_open() encounters an error such as when setting link state,
it calls release_resources() which frees the napi structures needlessly.
Instead, have __ibmvnic_open() only clean up the work it did so far (i.e.
disable napi and irqs) and leave the rest to the callers.

If caller of __ibmvnic_open() is ibmvnic_open(), it should release the
resources immediately. If the caller is do_reset() or do_hard_reset(),
they will release the resources on the next reset.

This fixes following crash that occurred when running the drmgr command
several times to add/remove a vnic interface:

[102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq
[102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq
[102056] ibmvnic 30000003 env3: Replenished 8 pools
Kernel attempted to read user page (10) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000010
Faulting instruction address: 0xc000000000a3c840
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
...
CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 5.16.0-rc5-autotest-g6441998e2e37 #1
Workqueue: events_long __ibmvnic_reset [ibmvnic]
NIP:  c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820
REGS: c0000000548e37e0 TRAP: 0300   Not tainted  (5.16.0-rc5-autotest-g6441998e2e37)
MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28248484  XER: 00000004
CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000
...
NIP [c000000000a3c840] napi_enable+0x20/0xc0
LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic]
Call Trace:
[c0000000548e3a80] [0000000000000006] 0x6 (unreliable)
[c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic]
[c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic]
[c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570
[c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660
[c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0
[c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010
38a0fff6 e92d1100 f9210028 39200000 <e9030010f9010020 60420000 e9210020
---[ end trace 5f8033b08fd27706 ]---

Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220208001918.900602-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agonet: dsa: seville: register the mdiobus under devres
Vladimir Oltean [Mon, 7 Feb 2022 16:15:51 +0000 (18:15 +0200)]
net: dsa: seville: register the mdiobus under devres

[ Upstream commit bd488afc3b39e045ba71aab472233f2a78726e7b ]

As explained in commits:
74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres")
5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres")

mdiobus_free() will panic when called from devm_mdiobus_free() <-
devres_release_all() <- __device_release_driver(), and that mdiobus was
not previously unregistered.

The Seville VSC9959 switch is a platform device, so the initial set of
constraints that I thought would cause this (I2C or SPI buses which call
->remove on ->shutdown) do not apply. But there is one more which
applies here.

If the DSA master itself is on a bus that calls ->remove from ->shutdown
(like dpaa2-eth, which is on the fsl-mc bus), there is a device link
between the switch and the DSA master, and device_links_unbind_consumers()
will unbind the seville switch driver on shutdown.

So the same treatment must be applied to all DSA switch drivers, which
is: either use devres for both the mdiobus allocation and registration,
or don't use devres at all.

The seville driver has a code structure that could accommodate both the
mdiobus_unregister and mdiobus_free calls, but it has an external
dependency upon mscc_miim_setup() from mdio-mscc-miim.c, which calls
devm_mdiobus_alloc_size() on its behalf. So rather than restructuring
that, and exporting yet one more symbol mscc_miim_teardown(), let's work
with devres and replace of_mdiobus_register with the devres variant.
When we use all-devres, we can ensure that devres doesn't free a
still-registered bus (it either runs both callbacks, or none).

Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agonet: dsa: ocelot: seville: utilize of_mdiobus_register
Colin Foster [Mon, 29 Nov 2021 01:57:36 +0000 (17:57 -0800)]
net: dsa: ocelot: seville: utilize of_mdiobus_register

[ Upstream commit 5186c4a05b9713138b762a49467a8ab9753cdb36 ]

Switch seville to use of_mdiobus_register(bus, NULL) instead of just
mdiobus_register. This code is about to be pulled into a separate module
that can optionally define ports by the device_node.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agogve: Recording rx queue before sending to napi
Tao Liu [Mon, 7 Feb 2022 17:59:01 +0000 (09:59 -0800)]
gve: Recording rx queue before sending to napi

[ Upstream commit 084cbb2ec3af2d23be9de65fcc9493e21e265859 ]

This caused a significant performance degredation when using generic XDP
with multiple queues.

Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Tao Liu <xliutaox@google.com>
Link: https://lore.kernel.org/r/20220207175901.2486596-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/i915: Disable DRRS on IVB/HSW port != A
Ville Syrjälä [Fri, 28 Jan 2022 10:37:50 +0000 (12:37 +0200)]
drm/i915: Disable DRRS on IVB/HSW port != A

[ Upstream commit ee59792c97176f12c1da31f29fc4c2aab187f06e ]

Currently we allow DRRS on IVB PCH ports, but we're missing a
few programming steps meaning it is guaranteed to not work.
And on HSW DRRS is not supported on anything but port A ever
as only transcoder EDP has the M2/N2 registers (though I'm
not sure if HSW ever has eDP on any other port).

Starting from BDW all transcoders have the dynamically
reprogrammable M/N registers so DRRS could work on any
port.

Stop initializing DRRS on ports where it cannot possibly work.

Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220128103757.22461-11-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit f0d4ce59f4d48622044933054a0e0cefa91ba15e)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/i915/display: Move DRRS code its own file
José Roberto de Souza [Fri, 27 Aug 2021 17:42:52 +0000 (10:42 -0700)]
drm/i915/display: Move DRRS code its own file

[ Upstream commit a1b63119ee839c8ff622407aab25c9723943638a ]

intel_dp.c is a 5k lines monster, so moving DRRS out of it to reduce
some lines from it.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210827174253.51122-2-jose.souza@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/i915/display: split out dpt out of intel_display.c
Jani Nikula [Mon, 23 Aug 2021 12:25:31 +0000 (15:25 +0300)]
drm/i915/display: split out dpt out of intel_display.c

[ Upstream commit dc6d6158a6e8b11a11544a541583296d9323050f ]

Let's try to reduce the size of intel_display.c, not increase it.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/934a2a0db05e835f6843befef6082e2034f23b3a.1629721467.git.jani.nikula@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoriscv/mm: Add XIP_FIXUP for phys_ram_base
Palmer Dabbelt [Fri, 4 Feb 2022 21:13:37 +0000 (13:13 -0800)]
riscv/mm: Add XIP_FIXUP for phys_ram_base

[ Upstream commit 4b1c70aa8ed8249608bb991380cb8ff423edf49e ]

This manifests as a crash early in boot on VexRiscv.

Signed-off-by: Myrtle Shah <gatecat@ds0.me>
[Palmer: split commit]
Fixes: 6d7f91d914bc ("riscv: Get rid of CONFIG_PHYS_RAM_BASE in kernel physical address conversion")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm: mxsfb: Fix NULL pointer dereference
Alexander Stein [Wed, 2 Feb 2022 08:17:55 +0000 (09:17 +0100)]
drm: mxsfb: Fix NULL pointer dereference

[ Upstream commit 622c9a3a7868e1eeca39c55305ca3ebec4742b64 ]

mxsfb should not ever dereference the NULL pointer which
drm_atomic_get_new_bridge_state is allowed to return.
Assume a fixed format instead.

Fixes: b776b0f00f24 ("drm: mxsfb: Use bus_format from the nearest bridge if present")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220202081755.145716-3-alexander.stein@ew.tq-group.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm: mxsfb: Set fallback bus format when the bridge doesn't provide one
Guido Günther [Mon, 11 Oct 2021 13:41:27 +0000 (15:41 +0200)]
drm: mxsfb: Set fallback bus format when the bridge doesn't provide one

[ Upstream commit 1db060509903b29d63fe2e39c14fd0f99c4a447e ]

If a bridge doesn't do any bus format handling MEDIA_BUS_FMT_FIXED is
returned. Fallback to a reasonable default (MEDIA_BUS_FMT_RGB888_1X24) in
that case.

This unbreaks e.g. using mxsfb with the nwl bridge and mipi dsi panels.

Reported-by: Martin Kepplinger <martink@posteo.de>
Signed-off-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Stefan Agner <stefan@agner.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/781f0352052cc50c823c199ef5f53c84902d0580.1633959458.git.agx@sigxcpu.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amd/display: Update watermark values for DCN301
Agustin Gutierrez [Fri, 28 Jan 2022 22:51:53 +0000 (17:51 -0500)]
drm/amd/display: Update watermark values for DCN301

[ Upstream commit 2d8ae25d233767171942a9fba5fd8f4a620996be ]

[Why]
There is underflow / visual corruption DCN301, for high
bandwidth MST DSC configurations such as 2x1440p144 or 2x4k60.

[How]
Use up-to-date watermark values for DCN301.

Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agobpf: Fix possible race in inc_misses_counter
He Fengqing [Sat, 22 Jan 2022 10:29:36 +0000 (10:29 +0000)]
bpf: Fix possible race in inc_misses_counter

[ Upstream commit 0e3135d3bfa5dfb658145238d2bc723a8e30c3a3 ]

It seems inc_misses_counter() suffers from same issue fixed in
the commit d979617aa84d ("bpf: Fixes possible race in update_prog_stats()
for 32bit arches"):
As it can run while interrupts are enabled, it could
be re-entered and the u64_stats syncp could be mangled.

Fixes: 9ed9e9ba2337 ("bpf: Count the number of times recursion was prevented")
Signed-off-by: He Fengqing <hefengqing@huawei.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20220122102936.1219518-1-hefengqing@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agobpf: Use u64_stats_t in struct bpf_prog_stats
Eric Dumazet [Tue, 26 Oct 2021 21:41:33 +0000 (14:41 -0700)]
bpf: Use u64_stats_t in struct bpf_prog_stats

[ Upstream commit 61a0abaee2092eee69e44fe60336aa2f5b578938 ]

Commit 316580b69d0a ("u64_stats: provide u64_stats_t type")
fixed possible load/store tearing on 64bit arches.

For instance the following C code

stats->nsecs += sched_clock() - start;

Could be rightfully implemented like this by a compiler,
confusing concurrent readers a lot:

stats->nsecs += sched_clock();
// arbitrary delay
stats->nsecs -= start;

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026214133.3114279-4-eric.dumazet@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agonet/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
Raed Salem [Thu, 2 Dec 2021 15:43:50 +0000 (17:43 +0200)]
net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic

[ Upstream commit 5352859b3bfa0ca188b2f1d2c1436fddc781e3b6 ]

IPsec crypto offload always set the ethernet segment checksum flags with
the inner L4 header checksum flag enabled for encapsulated IPsec offloaded
packet regardless of the encapsulated L4 header type, and even if it
doesn't exists in the first place, this breaks non TCP/UDP traffic as
such.

Set the inner L4 checksum flag only when the encapsulated L4 header
protocol is TCP/UDP using software parser swp_inner_l4_offset field as
indication.

Fixes: 5cfb540ef27b ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agonet/mlx5e: IPsec: Refactor checksum code in tx data path
Raed Salem [Tue, 26 Oct 2021 07:10:42 +0000 (10:10 +0300)]
net/mlx5e: IPsec: Refactor checksum code in tx data path

[ Upstream commit 428ffea0711a11efa0c1c4ee1fac27903ed091be ]

Part of code that is related solely to IPsec is always compiled in the
driver code regardless if the IPsec functionality is enabled or disabled
in the driver code, this will add unnecessary branch in case IPsec is
disabled at Tx data path.

Move IPsec related code to IPsec related file such that in case of IPsec
is disabled and because of unlikely macro the compiler should be able to
optimize and omit the checksum IPsec code all together from Tx data path

Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoocteontx2-af: Add KPU changes to parse NGIO as separate layer
Kiran Kumar K [Fri, 21 Jan 2022 06:34:47 +0000 (12:04 +0530)]
octeontx2-af: Add KPU changes to parse NGIO as separate layer

[ Upstream commit 745166fcf01cecc4f5ff3defc6586868349a43f9 ]

With current KPU profile NGIO is being parsed along with CTAG as
a single layer. Because of this MCAM/ntuple rules installed with
ethertype as 0x8842 are not being hit. Adding KPU profile changes
to parse NGIO in separate ltype and CTAG in separate ltype.

Fixes: f9c49be90c05 ("octeontx2-af: Update the default KPU profile and fixes")
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoocteontx2-af: Adjust LA pointer for cpt parse header
Kiran Kumar K [Wed, 29 Sep 2021 05:58:31 +0000 (11:28 +0530)]
octeontx2-af: Adjust LA pointer for cpt parse header

[ Upstream commit 85212a127e469c5560daf63a9782755ee4b03619 ]

In case of ltype NPC_LT_LA_CPT_HDR, LA pointer is pointing to the
start of cpt parse header. Since cpt parse header has veriable
length padding, this will be a problem for DMAC extraction. Adding
KPU profile changes to adjust the LA pointer to start at ether header
in case of cpt parse header by
   - Adding ptr advance in pkind 58 to a fixed value 40
   - Adding variable length offset 7 and mask 7 (pad len in
     CPT_PARSE_HDR).
Also added the missing static declaration for npc_set_var_len_offset_pkind
function.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoocteontx2-af: cn10k: Use appropriate register for LMAC enable
Geetha sowjanya [Fri, 21 Jan 2022 06:34:42 +0000 (12:04 +0530)]
octeontx2-af: cn10k: Use appropriate register for LMAC enable

[ Upstream commit fae80edeafbbba5ef9a0423aa5e5515518626433 ]

CN10K platforms uses RPM(0..2)_MTI_MAC100(0..3)_COMMAND_CONFIG
register for lmac TX/RX enable whereas CN9xxx platforms use
CGX_CMRX_CONFIG register. This config change was missed when
adding support for CN10K RPM.

Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoocteontx2-af: cn10k: RPM hardware timestamp configuration
Hariprasad Kelam [Tue, 28 Sep 2021 11:30:59 +0000 (17:00 +0530)]
octeontx2-af: cn10k: RPM hardware timestamp configuration

[ Upstream commit d1489208681dfe432609fdaa49b160219c6e221c ]

MAC on CN10K support hardware timestamping such that 8 bytes addition
header is prepended to incoming packets. This patch does necessary
configuration to enable Hardware time stamping upon receiving request
from PF netdev interfaces.

Timestamp configuration is different on MAC (CGX) Octeontx2 silicon
and MAC (RPM) OcteonTX3 CN10k. Based on silicon variant appropriate
fn() pointer is called. Refactor MAC specific mbox messages to remove
unnecessary gaps in mboxids.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoocteontx2-af: Reset PTP config in FLR handler
Harman Kalra [Tue, 28 Sep 2021 11:30:58 +0000 (17:00 +0530)]
octeontx2-af: Reset PTP config in FLR handler

[ Upstream commit e37e08fffc373206ad4e905c05729ea6bbdcb22c ]

Upon receiving ptp config request from netdev interface , Octeontx2 MAC
block CGX is configured to append timestamp to every incoming packet
and NPC config is updated with DMAC offset change.

Currently this configuration is not reset in FLR handler. This patch
resets the same.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoocteontx2-af: Optimize KPU1 processing for variable-length headers
Kiran Kumar K [Fri, 24 Sep 2021 06:18:51 +0000 (11:48 +0530)]
octeontx2-af: Optimize KPU1 processing for variable-length headers

[ Upstream commit edadeb38dc2fa2550801995b748110c3e5e59557 ]

Optimized KPU1 entry processing for variable-length custom L2 headers
of size 24B, 90B by
- Moving LA LTYPE parsing for 24B and 90B headers to PKIND.
- Removing LA flags assignment for 24B and 90B headers.
- Reserving a PKIND 55 to parse variable length headers.

Also, new mailbox(NPC_SET_PKIND) added to configure PKIND with
corresponding variable-length offset, mask, and shift count
(NPC_AF_KPUX_ENTRYX_ACTION0).

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoethtool: Fix link extended state for big endian
Moshe Tal [Thu, 20 Jan 2022 09:55:50 +0000 (11:55 +0200)]
ethtool: Fix link extended state for big endian

[ Upstream commit e2f08207c558bc0bc8abaa557cdb29bad776ac7b ]

The link extended sub-states are assigned as enum that is an integer
size but read from a union as u8, this is working for small values on
little endian systems but for big endian this always give 0. Fix the
variable in the union to match the enum size.

Fixes: ecc31c60240b ("ethtool: Add link extended state")
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amd/display: move FPU associated DSC code to DML folder
Qingqing Zhuo [Tue, 31 Aug 2021 11:52:24 +0000 (07:52 -0400)]
drm/amd/display: move FPU associated DSC code to DML folder

[ Upstream commit d738db6883df3e3c513f9e777c842262693f951b ]

[Why & How]
As part of the FPU isolation work documented in
https://patchwork.freedesktop.org/series/93042/, isolate code that uses
FPU in DSC to DML, where all FPU code should locate.

This change does not refactor any functions but move code around.

Cc: Christian König <christian.koenig@amd.com>
Cc: Hersen Wu <hersenxs.wu@amd.com>
Cc: Anson Jacob <Anson.Jacob@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Tested-by: Anson Jacob <Anson.Jacob@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amd/display: Use adjusted DCN301 watermarks
Nikola Cornij [Wed, 8 Sep 2021 02:09:01 +0000 (22:09 -0400)]
drm/amd/display: Use adjusted DCN301 watermarks

[ Upstream commit 808643ea56a2f96a42873d5e11c399957d6493aa ]

[why]
If DCN30 watermark calc is used for DCN301, the calculated values are
wrong due to the data structure mismatch between DCN30 and DCN301.
However, using the original DCN301 watermark values causes underflow.

[how]
- Add DCN21-style watermark calculations
- Adjust DCN301 watermark values to remove the underflow

Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amdgpu: filter out radeon secondary ids as well
Alex Deucher [Thu, 20 Jan 2022 17:17:07 +0000 (12:17 -0500)]
drm/amdgpu: filter out radeon secondary ids as well

[ Upstream commit 9e5a14bce2402e84251a10269df0235cd7ce9234 ]

Older radeon boards (r2xx-r5xx) had secondary PCI functions
which we solely there for supporting multi-head on OSs with
special requirements.  Add them to the unsupported list
as well so we don't attempt to bind to them.  The driver
would fail to bind to them anyway, but this does so
in a cleaner way that should not confuse the user.

Cc: stable@vger.kernel.org
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amdgpu: filter out radeon PCI device IDs
Alex Deucher [Tue, 3 Aug 2021 21:17:10 +0000 (17:17 -0400)]
drm/amdgpu: filter out radeon PCI device IDs

[ Upstream commit bdbeb0dde4258586bb2f481b12da1e83aa4766f3 ]

Once we claim all 0x1002 PCI display class devices, we will
need to filter out devices owned by radeon.

v2: rename radeon id array to make it more clear that
the devices are not supported by amdgpu.
    add r128, mach64 pci ids as well

Acked-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled
Nicholas Kazlauskas [Tue, 30 Nov 2021 14:32:33 +0000 (09:32 -0500)]
drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled

[ Upstream commit 70897848730470cc477d5d89e6222c0f6a9ac173 ]

[Why]
PSR currently relies on the kernel's delayed vblank on/off mechanism
as an implicit bufferring mechanism to prevent excessive entry/exit.

Without this delay the user experience is impacted since it can take
a few frames to enter/exit.

[How]
Only allow vblank disable immediate for DC when psr is not supported.

Leave a TODO indicating that this support should be extended in the
future to delay independent of the vblank interrupt.

Fixes: 92020e81ddbeac ("drm/amdgpu/display: set vblank_disable_immediate for DC")

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agohugetlbfs: fix off-by-one error in hugetlb_vmdelete_list()
Sean Christopherson [Fri, 14 Jan 2022 22:08:30 +0000 (14:08 -0800)]
hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list()

[ Upstream commit d6aba4c8e20d4d2bf65d589953f6d891c178f3a3 ]

Pass "end - 1" instead of "end" when walking the interval tree in
hugetlb_vmdelete_list() to fix an inclusive vs.  exclusive bug.  The two
callers that pass a non-zero "end" treat it as exclusive, whereas the
interval tree iterator expects an inclusive "last".  E.g.  punching a
hole in a file that precisely matches the size of a single hugepage,
with a vma starting right on the boundary, will result in
unmap_hugepage_range() being called twice, with the second call having
start==end.

The off-by-one error doesn't cause functional problems as
__unmap_hugepage_range() turns into a massive nop due to
short-circuiting its for-loop on "address < end".  But, the mmu_notifier
invocations to invalid_range_{start,end}() are passed a bogus zero-sized
range, which may be unexpected behavior for secondary MMUs.

The bug was exposed by commit ed922739c919 ("KVM: Use interval tree to
do fast hva lookup in memslots"), currently queued in the KVM tree for
5.17, which added a WARN to detect ranges with start==end.

Link: https://lkml.kernel.org/r/20211228234257.1926057-1-seanjc@google.com
Fixes: 1bfad99ab425 ("hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reported-by: syzbot+4e697fe80a31aa7efe21@syzkaller.appspotmail.com
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoselftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting
Waiman Long [Fri, 14 Jan 2022 22:07:58 +0000 (14:07 -0800)]
selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting

[ Upstream commit 209376ed2a8431ccb4c40fdcef11194fc1e749b0 ]

The hugetlb cgroup reservation test charge_reserved_hugetlb.sh assume
that no cgroup filesystems are mounted before running the test.  That is
not true in many cases.  As a result, the test fails to run.  Fix that
by querying the current cgroup mount setting and using the existing
cgroup setup instead before attempting to freshly mount a cgroup
filesystem.

Similar change is also made for hugetlb_reparenting_test.sh as well,
though it still has problem if cgroup v2 isn't used.

The patched test scripts were run on a centos 8 based system to verify
that they ran properly.

Link: https://lkml.kernel.org/r/20220106201359.1646575-1-longman@redhat.com
Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Mina Almasry <almasrymina@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agokasan: fix quarantine conflicting with init_on_free
Andrey Konovalov [Fri, 14 Jan 2022 22:05:01 +0000 (14:05 -0800)]
kasan: fix quarantine conflicting with init_on_free

[ Upstream commit 26dca996ea7b1ac7008b6b6063fc88b849e3ac3e ]

KASAN's quarantine might save its metadata inside freed objects.  As
this happens after the memory is zeroed by the slab allocator when
init_on_free is enabled, the memory coming out of quarantine is not
properly zeroed.

This causes lib/test_meminit.c tests to fail with Generic KASAN.

Zero the metadata when the object is removed from quarantine.

Link: https://lkml.kernel.org/r/2805da5df4b57138fdacd671f5d227d58950ba54.1640037083.git.andreyknvl@google.com
Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agomm: defer kmemleak object creation of module_alloc()
Kefeng Wang [Fri, 14 Jan 2022 22:04:11 +0000 (14:04 -0800)]
mm: defer kmemleak object creation of module_alloc()

[ Upstream commit 60115fa54ad7b913b7cb5844e6b7ffeb842d55f2 ]

Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
enabled(without KASAN_VMALLOC) on x86[1].

When the module area allocates memory, it's kmemleak_object is created
successfully, but the KASAN shadow memory of module allocation is not
ready, so when kmemleak scan the module's pointer, it will panic due to
no shadow memory with KASAN check.

  module_alloc
    __vmalloc_node_range
      kmemleak_vmalloc
kmemleak_scan
  update_checksum
    kasan_module_alloc
      kmemleak_ignore

Note, there is no problem if KASAN_VMALLOC enabled, the modules area
entire shadow memory is preallocated.  Thus, the bug only exits on ARCH
which supports dynamic allocation of module area per module load, for
now, only x86/arm64/s390 are involved.

Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
kmemleak in module_alloc() to fix this issue.

[1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/

[wangkefeng.wang@huawei.com: fix build]
Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
[akpm@linux-foundation.org: simplify ifdefs, per Andrey]
Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com
Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
Fixes: 793213a82de4 ("s390/kasan: dynamic shadow mem allocation for modules")
Fixes: 39d114ddc682 ("arm64: add KASAN support")
Fixes: bebf56a1b176 ("kasan: enable instrumentation of global variables")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agotracing/probes: check the return value of kstrndup() for pbuf
Xiaoke Wang [Tue, 14 Dec 2021 02:26:46 +0000 (10:26 +0800)]
tracing/probes: check the return value of kstrndup() for pbuf

[ Upstream commit 1c1857d400355e96f0fe8b32adc6fa7594d03b52 ]

kstrndup() is a memory allocation-related function, it returns NULL when
some internal memory errors happen. It is better to check the return
value of it so to catch the memory error in time.

Link: https://lkml.kernel.org/r/tencent_4D6E270731456EB88712ED7F13883C334906@qq.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agotracing/uprobes: Check the return value of kstrdup() for tu->filename
Xiaoke Wang [Tue, 14 Dec 2021 01:28:02 +0000 (09:28 +0800)]
tracing/uprobes: Check the return value of kstrdup() for tu->filename

[ Upstream commit 8c7224245557707c613f130431cafbaaa4889615 ]

kstrdup() returns NULL when some internal memory errors happen, it is
better to check the return value of it so to catch the memory error in
time.

Link: https://lkml.kernel.org/r/tencent_3C2E330722056D7891D2C83F29C802734B06@qq.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodma-buf: cma_heap: Fix mutex locking section
Weizhao Ouyang [Tue, 4 Jan 2022 07:35:45 +0000 (15:35 +0800)]
dma-buf: cma_heap: Fix mutex locking section

[ Upstream commit 54329e6f7beea6af56c1230da293acc97d6a6ee7 ]

Fix cma_heap_buffer mutex locking critical section to protect vmap_cnt
and vaddr.

Fixes: a5d2d29e24be ("dma-buf: heaps: Move heap-helper logic into the cma_heap implementation")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220104073545.124244-1-o451686892@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoi3c: master: dw: check return of dw_i3c_master_get_free_pos()
Tom Rix [Sat, 8 Jan 2022 15:09:48 +0000 (07:09 -0800)]
i3c: master: dw: check return of dw_i3c_master_get_free_pos()

[ Upstream commit 13462ba1815db5a96891293a9cfaa2451f7bd623 ]

Clang static analysis reports this problem
dw-i3c-master.c:799:9: warning: The result of the left shift is
  undefined because the left operand is negative
                      COMMAND_PORT_DEV_INDEX(pos) |
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~

pos can be negative because dw_i3c_master_get_free_pos() can return an
error.  So check for an error.

Fixes: 1dd728f5d4d4 ("i3c: master: Add driver for Synopsys DesignWare IP")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220108150948.3988790-1-trix@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt
Guchun Chen [Fri, 7 Jan 2022 08:31:20 +0000 (16:31 +0800)]
drm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt

[ Upstream commit 2096b74b1da5ca418827b54ac4904493bd9de89c ]

This is observed in SRIOV case with virtual KMS as display.

_raw_spin_lock_irqsave+0x37/0x40
drm_handle_vblank+0x69/0x350 [drm]
? try_to_wake_up+0x432/0x5c0
? amdgpu_vkms_prepare_fb+0x1c0/0x1c0 [amdgpu]
drm_crtc_handle_vblank+0x17/0x20 [drm]
amdgpu_vkms_vblank_simulate+0x4d/0x80 [amdgpu]
__hrtimer_run_queues+0xfb/0x230
hrtimer_interrupt+0x109/0x220
__sysvec_apic_timer_interrupt+0x64/0xe0
asm_call_irq_on_stack+0x12/0x20

Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Kelly Zytaruk <kelly.zytaruk@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amdkfd: Check for null pointer after calling kmemdup
Jiasheng Jiang [Wed, 5 Jan 2022 09:09:43 +0000 (17:09 +0800)]
drm/amdkfd: Check for null pointer after calling kmemdup

[ Upstream commit abfaf0eee97925905e742aa3b0b72e04a918fa9e ]

As the possible failure of the allocation, kmemdup() may return NULL
pointer.
Therefore, it should be better to check the 'props2' in order to prevent
the dereference of NULL pointer.

Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agontb_hw_switchtec: Fix bug with more than 32 partitions
Wesley Sheng [Fri, 24 Dec 2021 01:23:30 +0000 (17:23 -0800)]
ntb_hw_switchtec: Fix bug with more than 32 partitions

[ Upstream commit 7ff351c86b6b258f387502ab2c9b9d04f82c1c3d ]

Switchtec could support as mush as 48 partitions, but ffs & fls are
for 32 bit argument, in case of partition index larger than 31, the
current code could not parse the peer partition index correctly.
Change to the 64 bit version __ffs64 & fls64 accordingly to fix this
bug.

Fixes: 3df54c870f52 ("ntb_hw_switchtec: Allow using Switchtec NTB in multi-partition setups")
Signed-off-by: Wesley Sheng <wesley.sheng@microchip.com>
Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agontb_hw_switchtec: Fix pff ioread to read into mmio_part_cfg_all
Jeremy Pallotta [Fri, 24 Dec 2021 01:23:29 +0000 (17:23 -0800)]
ntb_hw_switchtec: Fix pff ioread to read into mmio_part_cfg_all

[ Upstream commit 32c3d375b0ed84b6acb51ae5ebef35ff0d649d85 ]

Array mmio_part_cfg_all holds the partition configuration of all
partitions, with partition number as index. Fix this by reading into
mmio_part_cfg_all for pff.

Fixes: 0ee28f26f378 ("NTB: switchtec_ntb: Add link management")
Signed-off-by: Jeremy Pallotta <jmpallotta@gmail.com>
Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/atomic: Check new_crtc_state->active to determine if CRTC needs disable in self...
Liu Ying [Thu, 30 Dec 2021 04:06:26 +0000 (12:06 +0800)]
drm/atomic: Check new_crtc_state->active to determine if CRTC needs disable in self refresh mode

[ Upstream commit 69e630016ef4e4a1745310c446f204dc6243e907 ]

Actual hardware state of CRTC is controlled by the member 'active' in
struct drm_crtc_state instead of the member 'enable', according to the
kernel doc of the member 'enable'.  In fact, the drm client modeset
and atomic helpers are using the member 'active' to do the control.

Referencing the member 'enable' of new_crtc_state, the function
crtc_needs_disable() may fail to reflect if CRTC needs disable in
self refresh mode, e.g., when the framebuffer emulation will be blanked
through the client modeset helper with the next commit, the member
'enable' of new_crtc_state is still true while the member 'active' is
false, hence the relevant potential encoder and bridges won't be disabled.

So, let's check new_crtc_state->active to determine if CRTC needs disable
in self refresh mode instead of new_crtc_state->enable.

Fixes: 1452c25b0e60 ("drm: Add helpers to kick off self refresh mode in drivers")
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Liu Ying <victor.liu@nxp.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211230040626.646807-1-victor.liu@nxp.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/sun4i: dw-hdmi: Fix missing put_device() call in sun8i_hdmi_phy_get
Miaoqian Lin [Fri, 7 Jan 2022 08:36:32 +0000 (08:36 +0000)]
drm/sun4i: dw-hdmi: Fix missing put_device() call in sun8i_hdmi_phy_get

[ Upstream commit c71af3dae3e34d2fde0c19623cf7f8483321f0e3 ]

The reference taken by 'of_find_device_by_node()' must be released when
not needed anymore.
Add the corresponding 'put_device()' in the error handling path.

Fixes: 9bf3797796f5 ("drm/sun4i: dw-hdmi: Make HDMI PHY into a platform device")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220107083633.20843-1-linmq006@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoSUNRPC: Fix sockaddr handling in svcsock_accept_class trace points
Chuck Lever [Sat, 8 Jan 2022 21:59:54 +0000 (16:59 -0500)]
SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points

[ Upstream commit 16720861675393a35974532b3c837d9fd7bfe08c ]

Avoid potentially hazardous memory copying and the needless use of
"%pIS" -- in the kernel, an RPC service listener is always bound to
ANYADDR. Having the network namespace is helpful when recording
errors, though.

Fixes: a0469f46faab ("SUNRPC: Replace dprintk call sites in TCP state change callouts")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoSUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point
Chuck Lever [Sun, 9 Jan 2022 18:26:51 +0000 (13:26 -0500)]
SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point

[ Upstream commit dc6c6fb3d639756a532bcc47d4a9bf9f3965881b ]

While testing, I got an unexpected KASAN splat:

Jan 08 13:50:27 oracle-102.nfsv4.dev kernel: BUG: KASAN: stack-out-of-bounds in trace_event_raw_event_svc_xprt_create_err+0x190/0x210 [sunrpc]
Jan 08 13:50:27 oracle-102.nfsv4.dev kernel: Read of size 28 at addr ffffc9000008f728 by task mount.nfs/4628

The memcpy() in the TP_fast_assign section of this trace point
copies the size of the destination buffer in order that the buffer
won't be overrun.

In other similar trace points, the source buffer for this memcpy is
a "struct sockaddr_storage" so the actual length of the source
buffer is always long enough to prevent the memcpy from reading
uninitialized or unallocated memory.

However, for this trace point, the source buffer can be as small as
a "struct sockaddr_in". For AF_INET sockaddrs, the memcpy() reads
memory that follows the source buffer, which is not always valid
memory.

To avoid copying past the end of the passed-in sockaddr, make the
source address's length available to the memcpy(). It would be a
little nicer if the tracing infrastructure was more friendly about
storing socket addresses that are not AF_INET, but I could not find
a way to make printk("%pIS") work with a dynamic array.

Reported-by: KASAN
Fixes: 4b8f380e46e4 ("SUNRPC: Tracepoint to record errors in svc_xpo_create()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/i915: don't call free_mmap_offset when purging
Matthew Auld [Thu, 6 Jan 2022 17:49:07 +0000 (17:49 +0000)]
drm/i915: don't call free_mmap_offset when purging

[ Upstream commit 4c2602ba8d74c35d550ed3d518809c697de08d88 ]

The TTM backend is in theory the only user here(also purge should only
be called once we have dropped the pages), where it is setup at object
creation and is only removed once the object is destroyed. Also
resetting the node here might be iffy since the ttm fault handler
uses the stored fake offset to determine the page offset within the pages
array.

This also blows up in the dontneed-before-mmap test, since the
expectation is that the vma_node will live on, until the object is
destroyed:

<2> [749.062902] kernel BUG at drivers/gpu/drm/i915/gem/i915_gem_ttm.c:943!
<4> [749.062923] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4> [749.062928] CPU: 0 PID: 1643 Comm: gem_madvise Tainted: G     U  W         5.16.0-rc8-CI-CI_DRM_11046+ #1
<4> [749.062933] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390 Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
<4> [749.062937] RIP: 0010:i915_ttm_mmap_offset.cold.35+0x5b/0x5d [i915]
<4> [749.063044] Code: 00 48 c7 c2 a0 23 4e a0 48 c7 c7 26 df 4a a0 e8 95 1d d0 e0 bf 01 00 00 00 e8 8b ec cf e0 31 f6 bf 09 00 00 00 e8 5f 30 c0 e0 <0f> 0b 48 c7 c1 24 4b 56 a0 ba 5b 03 00 00 48 c7 c6 c0 23 4e a0 48
<4> [749.063052] RSP: 0018:ffffc90002ab7d38 EFLAGS: 00010246
<4> [749.063056] RAX: 0000000000000240 RBX: ffff88811f2e61c0 RCX: 0000000000000006
<4> [749.063060] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
<4> [749.063063] RBP: ffffc90002ab7e58 R08: 0000000000000001 R09: 0000000000000001
<4> [749.063067] R10: 000000000123d0f8 R11: ffffc90002ab7b20 R12: ffff888112a1a000
<4> [749.063071] R13: 0000000000000004 R14: ffff88811f2e61c0 R15: ffff888112a1a000
<4> [749.063074] FS:  00007f6e5fcad500(0000) GS:ffff8884ad600000(0000) knlGS:0000000000000000
<4> [749.063078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [749.063081] CR2: 00007efd264e39f0 CR3: 0000000115fd6005 CR4: 00000000003706f0
<4> [749.063085] Call Trace:
<4> [749.063087]  <TASK>
<4> [749.063089]  __assign_mmap_offset+0x41/0x300 [i915]
<4> [749.063171]  __assign_mmap_offset_handle+0x159/0x270 [i915]
<4> [749.063248]  ? i915_gem_dumb_mmap_offset+0x70/0x70 [i915]
<4> [749.063325]  drm_ioctl_kernel+0xae/0x140
<4> [749.063330]  drm_ioctl+0x201/0x3d0
<4> [749.063333]  ? i915_gem_dumb_mmap_offset+0x70/0x70 [i915]
<4> [749.063409]  ? do_user_addr_fault+0x200/0x670
<4> [749.063415]  __x64_sys_ioctl+0x6d/0xa0
<4> [749.063419]  do_syscall_64+0x3a/0xb0
<4> [749.063423]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4> [749.063428] RIP: 0033:0x7f6e5f100317

Testcase: igt/gem_madvise/dontneed-before-mmap
Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-1-matthew.auld@intel.com
(cherry picked from commit 658a0c632625e1db51837ff754fe18a6a7f2ccf8)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agox86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
Vitaly Kuznetsov [Thu, 6 Jan 2022 09:46:11 +0000 (10:46 +0100)]
x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()

[ Upstream commit 51500b71d500f251037ed339047a4d9e7d7e295b ]

KASAN detected the following issue:

 BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060
 Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33

 CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1
 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
     BIOS Hyper-V UEFI Release v4.0 12/17/2019
 Call Trace:
  dump_stack_lvl+0x57/0x7d
  print_address_description.constprop.0+0x1f/0x140
  ? hyperv_flush_tlb_multi+0xf88/0x1060
  __kasan_report.cold+0x7f/0x11e
  ? hyperv_flush_tlb_multi+0xf88/0x1060
  kasan_report+0x38/0x50
  hyperv_flush_tlb_multi+0xf88/0x1060
  flush_tlb_mm_range+0x1b1/0x200
  ptep_clear_flush+0x10e/0x150
...
 Allocated by task 0:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc+0x7c/0x90
  hv_common_init+0xae/0x115
  hyperv_init+0x97/0x501
  apic_intr_mode_init+0xb3/0x1e0
  x86_late_time_init+0x92/0xa2
  start_kernel+0x338/0x3eb
  secondary_startup_64_no_verify+0xc2/0xcb

 The buggy address belongs to the object at ffff8880011cc800
  which belongs to the cache kmalloc-1k of size 1024
 The buggy address is located 960 bytes inside of
  1024-byte region [ffff8880011cc800ffff8880011ccc00)

'hyperv_flush_tlb_multi+0xf88/0x1060' points to
hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get
VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to
access past hv_vp_index's last element. This can (and will) happen
when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'.

Commit ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has
been disabled") tried to deal with empty cpumask situation but
apparently didn't fully fix the issue.

'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is
'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every
time the particular mm is scheduled/unscheduled on some CPU (see
switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote
TLB flush has zero influence on whether the particular process can get
scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was
scheduled on one other CPU and got unscheduled during
hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty.

It doesn't seem that there's a good way to protect 'mm_cpumask(mm)'
from changing during hyperv_flush_tlb_multi()'s execution. It would be
possible to copy it in the very beginning of the function but this is a
waste. It seems we can deal with changing cpumask just fine.

When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s
execution, there are two possible issues:
- 'Under-flushing': we will not flush TLB on a CPU which got added to
the mask while hyperv_flush_tlb_multi() was already running. This is
not a problem as this is equal to mm getting scheduled on that CPU
right after TLB flush.
- 'Over-flushing': we may flush TLB on a CPU which is already cleared
from the mask. First, extra TLB flush preserves correctness. Second,
Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may
avoid the flush if CR3 doesn't match.

Fix the immediate issue with cpumask_last()/hv_cpu_number_to_vp_number()
and remove the pointless cpumask_empty() check from the beginning of the
function as it really doesn't protect anything. Also, avoid the hypercall
altogether when 'flush->processor_mask' ends up being empty.

Fixes: ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has been disabled")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220106094611.1404218-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agonfsd: fix crash on COPY_NOTIFY with special stateid
J. Bruce Fields [Wed, 5 Jan 2022 19:15:03 +0000 (14:15 -0500)]
nfsd: fix crash on COPY_NOTIFY with special stateid

[ Upstream commit 074b07d94e0bb6ddce5690a9b7e2373088e8b33a ]

RTM says "If the special ONE stateid is passed to
nfs4_preprocess_stateid_op(), it returns status=0 but does not set
*cstid. nfsd4_copy_notify() depends on stid being set if status=0, and
thus can crash if the client sends the right COPY_NOTIFY RPC."

RFC 7862 says "The cna_src_stateid MUST refer to either open or locking
states provided earlier by the server.  If it is invalid, then the
operation MUST fail."

The RFC doesn't specify an error, and the choice doesn't matter much as
this is clearly illegal client behavior, but bad_stateid seems
reasonable.

Simplest is just to guarantee that nfs4_preprocess_stateid_op, called
with non-NULL cstid, errors out if it can't return a stateid.

Reported-by: rtm@csail.mit.edu
Fixes: 624322f1adc5 ("NFSD add COPY_NOTIFY operation")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoRevert "nfsd: skip some unnecessary stats in the v4 case"
Chuck Lever [Fri, 24 Dec 2021 19:22:28 +0000 (14:22 -0500)]
Revert "nfsd: skip some unnecessary stats in the v4 case"

[ Upstream commit 58f258f65267542959487dbe8b5641754411843d ]

On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes
returning a reasonable-looking value in the cinfo.before field and
zero in the cinfo.after field.

RFC 8881 Section 10.8.1 says:
> When a client is making changes to a given directory, it needs to
> determine whether there have been changes made to the directory by
> other clients.  It does this by using the change attribute as
> reported before and after the directory operation in the associated
> change_info4 value returned for the operation.

and

> ... The post-operation change
> value needs to be saved as the basis for future change_info4
> comparisons.

A good quality client implementation therefore saves the zero
cinfo.after value. During a subsequent OPEN operation, it will
receive a different non-zero value in the cinfo.before field for
that directory, and it will incorrectly believe the directory has
changed, triggering an undesirable directory cache invalidation.

There are filesystem types where fs_supports_change_attribute()
returns false, tmpfs being one. On NFSv4 mounts, this means the
fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is
never invoked. Subsequently, nfsd4_change_attribute() is invoked
with an uninitialized @stat argument.

In fill_pre_wcc(), @stat contains stale stack garbage, which is
then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all
zeroes, so zero is placed on the wire. Both of these values are
meaningless.

This fix can be applied immediately to stable kernels. Once there
are more regression tests in this area, this optimization can be
attempted again.

Fixes: 428a23d2bf0c ("nfsd: skip some unnecessary stats in the v4 case")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoNFSD: Fix verifier returned in stable WRITEs
Chuck Lever [Tue, 28 Dec 2021 17:35:43 +0000 (12:35 -0500)]
NFSD: Fix verifier returned in stable WRITEs

[ Upstream commit f11ad7aa653130b71e2e89bed207f387718216d5 ]

RFC 8881 explains the purpose of the write verifier this way:

> The final portion of the result is the field writeverf. This field
> is the write verifier and is a cookie that the client can use to
> determine whether a server has changed instance state (e.g., server
> restart) between a call to WRITE and a subsequent call to either
> WRITE or COMMIT.

But then it says:

> This cookie MUST be unchanged during a single instance of the
> NFSv4.1 server and MUST be unique between instances of the NFSv4.1
> server. If the cookie changes, then the client MUST assume that
> any data written with an UNSTABLE4 value for committed and an old
> writeverf in the reply has been lost and will need to be
> recovered.

RFC 1813 has similar language for NFSv3. NFSv2 does not have a write
verifier since it doesn't implement the COMMIT procedure.

Since commit 19e0663ff9bc ("nfsd: Ensure sampling of the write
verifier is atomic with the write"), the Linux NFS server has
returned a boot-time-based verifier for UNSTABLE WRITEs, but a zero
verifier for FILE_SYNC and DATA_SYNC WRITEs. FILE_SYNC and DATA_SYNC
WRITEs are not followed up with a COMMIT, so there's no need for
clients to compare verifiers for stable writes.

However, by returning a different verifier for stable and unstable
writes, the above commit puts the Linux NFS server a step farther
out of compliance with the first MUST above. At least one NFS client
(FreeBSD) noticed the difference, making this a potential
regression.

Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Link: https://lore.kernel.org/linux-nfs/YQXPR0101MB096857EEACF04A6DF1FC6D9BDD749@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM/T/
Fixes: 19e0663ff9bc ("nfsd: Ensure sampling of the write verifier is atomic with the write")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Fix support for DEVCAP2, DEVCTL2 and LNKCTL2 registers on emulated bridge
Pali Rohár [Thu, 25 Nov 2021 12:46:05 +0000 (13:46 +0100)]
PCI: mvebu: Fix support for DEVCAP2, DEVCTL2 and LNKCTL2 registers on emulated bridge

[ Upstream commit 4ab34548c55fbbb3898306a47dfaccd4860e1ccb ]

Armada XP and new hardware supports access to DEVCAP2, DEVCTL2 and LNKCTL2
configuration registers of PCIe core via PCIE_CAP_PCIEXP. So export them
via emulated software root bridge.

Pre-XP hardware does not support these registers and returns zeros.

Link: https://lore.kernel.org/r/20211125124605.25915-16-pali@kernel.org
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Fix support for PCI_EXP_RTSTA on emulated bridge
Pali Rohár [Thu, 25 Nov 2021 12:46:04 +0000 (13:46 +0100)]
PCI: mvebu: Fix support for PCI_EXP_RTSTA on emulated bridge

[ Upstream commit 838ff44a398ff47fe9b924961d91aee325821220 ]

PME Status bit in Root Status Register (PCIE_RC_RTSTA_OFF) is read-only and
can be cleared only by writing 0b to the Interrupt Cause RW0C register
(PCIE_INT_CAUSE_OFF).

Link: https://lore.kernel.org/r/20211125124605.25915-15-pali@kernel.org
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Fix support for PCI_EXP_DEVCTL on emulated bridge
Pali Rohár [Thu, 25 Nov 2021 12:46:03 +0000 (13:46 +0100)]
PCI: mvebu: Fix support for PCI_EXP_DEVCTL on emulated bridge

[ Upstream commit ecae073e393e65ee7be7ebf3fdd5258ab99f1636 ]

Comment in Armada 370 functional specification is misleading.
PCI_EXP_DEVCTL_*RE bits are supported and configures receiving of error
interrupts.

Link: https://lore.kernel.org/r/20211125124605.25915-14-pali@kernel.org
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge
Pali Rohár [Thu, 25 Nov 2021 12:46:02 +0000 (13:46 +0100)]
PCI: mvebu: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge

[ Upstream commit d75404cc08832206f173668bd35391c581fea121 ]

Hardware supports PCIe Hot Reset via PCIE_CTRL_OFF register. Use it for
implementing PCI_BRIDGE_CTL_BUS_RESET bit of PCI_BRIDGE_CONTROL register on
emulated bridge.

With this change the function pci_reset_secondary_bus() starts working and
can reset connected PCIe card.

Link: https://lore.kernel.org/r/20211125124605.25915-13-pali@kernel.org
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Setup PCIe controller to Root Complex mode
Pali Rohár [Thu, 25 Nov 2021 12:45:59 +0000 (13:45 +0100)]
PCI: mvebu: Setup PCIe controller to Root Complex mode

[ Upstream commit df08ac016124bd88b8598ac0599d7b89c0642774 ]

This driver operates only in Root Complex mode, so ensure that hardware is
properly configured in Root Complex mode.

Link: https://lore.kernel.org/r/20211125124605.25915-10-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge
Pali Rohár [Thu, 25 Nov 2021 12:46:01 +0000 (13:46 +0100)]
PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge

[ Upstream commit 91a8d79fc797d3486ae978beebdfc55261c7d65b ]

It looks like that mvebu PCIe controller has for each PCIe link fully
independent PCIe host bridge and so every PCIe Root Port is isolated not
only on its own bus but also isolated from each others. But in past device
tree structure was defined to put all PCIe Root Ports (as PCI Bridge
devices) into one root bus 0 and this bus is emulated by pci-mvebu.c
driver.

Probably reason for this decision was incorrect understanding of PCIe
topology of these Armada SoCs and also reason of misunderstanding how is
PCIe controller generating Type 0 and Type 1 config requests (it is fully
different compared to other drivers). Probably incorrect setup leaded to
very surprised things like having PCIe Root Port (PCI Bridge device, with
even incorrect Device Class set to Memory Controller) and the PCIe device
behind the Root Port on the same PCI bus, which obviously was needed to
somehow hack (as these two devices cannot be in reality on the same bus).

Properly set mvebu local bus number and mvebu local device number based on
PCI Bridge secondary bus number configuration. Also correctly report
configured secondary bus number in config space. And explain in driver
comment why this setup is correct.

Link: https://lore.kernel.org/r/20211125124605.25915-12-pali@kernel.org
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Fix support for bus mastering and PCI_COMMAND on emulated bridge
Pali Rohár [Thu, 25 Nov 2021 12:45:56 +0000 (13:45 +0100)]
PCI: mvebu: Fix support for bus mastering and PCI_COMMAND on emulated bridge

[ Upstream commit e42b85583719adb87ab88dc7bcd41b38011f7d11 ]

According to PCI specifications bits [0:2] of Command Register, this should
be by default disabled on reset. So explicitly disable these bits at early
beginning of driver initialization.

Also remove code which unconditionally enables all 3 bits and let kernel
code (via pci_set_master() function) to handle bus mastering of PCI Bridge
via emulated PCI_COMMAND on emulated bridge.

Adjust existing functions mvebu_pcie_handle_iobase_change() and
mvebu_pcie_handle_membase_change() to handle PCI_IO_BASE and PCI_MEM_BASE
registers correctly even when bus mastering on emulated bridge is disabled.

Link: https://lore.kernel.org/r/20211125124605.25915-7-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Do not modify PCI IO type bits in conf_write
Pali Rohár [Thu, 25 Nov 2021 12:45:57 +0000 (13:45 +0100)]
PCI: mvebu: Do not modify PCI IO type bits in conf_write

[ Upstream commit 2cf150216e5b5619d7c25180ccf2cc8ac7bebc13 ]

PCI IO type bits are already initialized in mvebu_pci_bridge_emul_init()
function and only when IO support is enabled. These type bits are read-only
and pci-bridge-emul.c code already does not allow to modify them from upper
layers.

When IO support is disabled then all IO registers should be read-only and
return zeros. Therefore do not modify PCI IO type bits in
mvebu_pci_bridge_emul_base_conf_write() callback.

Link: https://lore.kernel.org/r/20211125124605.25915-8-pali@kernel.org
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mvebu: Check for errors from pci_bridge_emul_init() call
Pali Rohár [Thu, 25 Nov 2021 12:45:52 +0000 (13:45 +0100)]
PCI: mvebu: Check for errors from pci_bridge_emul_init() call

[ Upstream commit 5d18d702e5c9309f4195653475c7a7fdde4ca71f ]

Function pci_bridge_emul_init() may fail so correctly check for errors.

Link: https://lore.kernel.org/r/20211125124605.25915-3-pali@kernel.org
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoInput: ti_am335x_tsc - fix STEPCONFIG setup for Z2
Dario Binacchi [Mon, 13 Dec 2021 05:14:48 +0000 (21:14 -0800)]
Input: ti_am335x_tsc - fix STEPCONFIG setup for Z2

[ Upstream commit 6bfeb6c21e1bdc11c328b7d996d20f0f73c6b9b0 ]

The Z2 step configuration doesn't erase the SEL_INP_SWC_3_0 bit-field
before setting the ADC channel. This way its value could be corrupted by
the ADC channel selected for the Z1 coordinate.

Fixes: 8c896308feae ("input: ti_am335x_adc: use only FIFO0 and clean up a little")
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Link: https://lore.kernel.org/r/20211212125358.14416-3-dariobin@libero.it
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoInput: ti_am335x_tsc - set ADCREFM for X configuration
Dario Binacchi [Mon, 13 Dec 2021 05:14:35 +0000 (21:14 -0800)]
Input: ti_am335x_tsc - set ADCREFM for X configuration

[ Upstream commit 73cca71a903202cddc8279fc76b2da4995da5bea ]

As reported by the STEPCONFIG[1-16] registered field descriptions of the
TI reference manual, for the ADC "in single ended, SEL_INM_SWC_3_0 must
be 1xxx".

Unlike the Y and Z coordinates, this bit has not been set for the step
configuration registers used to sample the X coordinate.

Fixes: 1b8be32e6914 ("Input: add support for TI Touchscreen controller")
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Link: https://lore.kernel.org/r/20211212125358.14416-2-dariobin@libero.it
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agotracing: Do not let synth_events block other dyn_event systems during create
Beau Belgrave [Thu, 30 Sep 2021 22:38:21 +0000 (15:38 -0700)]
tracing: Do not let synth_events block other dyn_event systems during create

[ Upstream commit 4f67cca70c0f615e9cfe6ac42244f3416ec60877 ]

synth_events is returning -EINVAL if the dyn_event create command does
not contain ' \t'. This prevents other systems from getting called back.
synth_events needs to return -ECANCELED in these cases when the command
is not targeting the synth_event system.

Link: https://lore.kernel.org/linux-trace-devel/20210930223821.11025-1-beaub@linux.microsoft.com
Fixes: c9e759b1e8456 ("tracing: Rework synthetic event command parsing")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoi3c/master/mipi-i3c-hci: Fix a potentially infinite loop in 'hci_dat_v1_get_index()'
Christophe JAILLET [Wed, 17 Nov 2021 22:05:23 +0000 (23:05 +0100)]
i3c/master/mipi-i3c-hci: Fix a potentially infinite loop in 'hci_dat_v1_get_index()'

[ Upstream commit 3f43926f271287fb1744c9ac9ae1122497f2b0c2 ]

The code in 'hci_dat_v1_get_index()' really looks like a hand coded version
of 'for_each_set_bit()', except that a +1 is missing when searching for the
next set bit.

This really looks odd and it seems that it will loop until 'dat_w0_read()'
returns the expected result.

So use 'for_each_set_bit()' instead. It is less verbose and should be more
correct.

Fixes: 9ad9a52cce28 ("i3c/master: introduce the mipi-i3c-hci driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/0cdf3cb10293ead1acd271fdb8a70369c298c082.1637186628.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoi3c: fix incorrect address slot lookup on 64-bit
Jamie Iles [Wed, 22 Sep 2021 16:56:00 +0000 (17:56 +0100)]
i3c: fix incorrect address slot lookup on 64-bit

[ Upstream commit f18f98110f2b179792cb70d85cba697320a3790f ]

The address slot bitmap is an array of unsigned long's which are the
same size as an int on 32-bit platforms but not 64-bit.  Loading the
bitmap into an int could result in the incorrect status being returned
for a slot and slots being reported as the wrong status.

Fixes: 3a379bbcea0a ("i3c: Add core I3C infrastructure")
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210922165600.179394-1-quic_jiles@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoKVM: x86: Exit to userspace if emulation prepared a completion callback
Hou Wenlong [Tue, 2 Nov 2021 09:15:32 +0000 (17:15 +0800)]
KVM: x86: Exit to userspace if emulation prepared a completion callback

[ Upstream commit adbfb12d4c4517a8adde23a7fc46538953d56eea ]

em_rdmsr() and em_wrmsr() return X86EMUL_IO_NEEDED if MSR accesses
required an exit to userspace. However, x86_emulate_insn() doesn't return
X86EMUL_*, so x86_emulate_instruction() doesn't directly act on
X86EMUL_IO_NEEDED; instead, it looks for other signals to differentiate
between PIO, MMIO, etc. causing RDMSR/WRMSR emulation not to
exit to userspace now.

Nevertheless, if the userspace_msr_exit_test testcase in selftests
is changed to test RDMSR/WRMSR with a forced emulation prefix,
the test passes.  What happens is that first userspace exit
information is filled but the userspace exit does not happen.
Because x86_emulate_instruction() returns 1, the guest retries
the instruction---but this time RIP has already been adjusted
past the forced emulation prefix, so the guest executes RDMSR/WRMSR
and the userspace exit finally happens.

Since the X86EMUL_IO_NEEDED path has provided a complete_userspace_io
callback, x86_emulate_instruction() can just return 0 if the
callback is not NULL. Then RDMSR/WRMSR instruction emulation will
exit to userspace directly, without the RDMSR/WRMSR vmexit.

Fixes: 1ae099540e8c7 ("KVM: x86: Allow deflecting unknown MSR accesses to user space")
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <56f9df2ee5c05a81155e2be366c9dc1f7adc8817.1635842679.git.houwenlong93@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoKVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg
Sean Christopherson [Tue, 2 Nov 2021 09:15:29 +0000 (17:15 +0800)]
KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg

[ Upstream commit 5e854864ee4384736f27a986633bae21731a4e4e ]

Truncate the new EIP to a 32-bit value when handling EMULTYPE_SKIP as the
decode phase does not truncate _eip.  Wrapping the 32-bit boundary is
legal if and only if CS is a flat code segment, but that check is
implicitly handled in the form of limit checks in the decode phase.

Opportunstically prepare for a future fix by storing the result of any
truncation in "eip" instead of "_eip".

Fixes: 1957aa63be53 ("KVM: VMX: Handle single-step #DB for EMULTYPE_SKIP on EPT misconfig")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <093eabb1eab2965201c9b018373baf26ff256d85.1635842679.git.houwenlong93@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoKVM: X86: Ensure that dirty PDPTRs are loaded
Lai Jiangshan [Mon, 8 Nov 2021 12:43:53 +0000 (20:43 +0800)]
KVM: X86: Ensure that dirty PDPTRs are loaded

[ Upstream commit 2c5653caecc4807b8abfe9c41880ac38417be7bf ]

For VMX with EPT, dirty PDPTRs need to be loaded before the next vmentry
via vmx_load_mmu_pgd()

But not all paths that call load_pdptrs() will cause vmx_load_mmu_pgd()
to be invoked.  Normally, kvm_mmu_reset_context() is used to cause
KVM_REQ_LOAD_MMU_PGD, but sometimes it is skipped:

* commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and
CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs()
when changing CR0.CD and CR0.NW.

* commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current
PCID on MOV CR3 w/ flush") skips KVM_REQ_LOAD_MMU_PGD after
load_pdptrs() when rewriting the CR3 with the same value.

* commit a91a7c709600("KVM: X86: Don't reset mmu context when
toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after
load_pdptrs() when changing CR4.PGE.

Fixes: d81135a57aa6 ("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")
Fixes: 21823fbda552 ("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush")
Fixes: a91a7c709600 ("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211108124407.12187-2-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoKVM: VMX: Read Posted Interrupt "control" exactly once per loop iteration
Sean Christopherson [Sat, 9 Oct 2021 02:12:19 +0000 (19:12 -0700)]
KVM: VMX: Read Posted Interrupt "control" exactly once per loop iteration

[ Upstream commit cfb0e1306a3790eb055ebf7cdb7b0ee8a23e9b6e ]

Use READ_ONCE() when loading the posted interrupt descriptor control
field to ensure "old" and "new" have the same base value.  If the
compiler emits separate loads, and loads into "new" before "old", KVM
could theoretically drop the ON bit if it were set between the loads.

Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-27-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoKVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPU
Sean Christopherson [Sat, 9 Oct 2021 02:11:56 +0000 (19:11 -0700)]
KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPU

[ Upstream commit 6f390916c4fb359507d9ac4bf1b28a4f8abee5c0 ]

Wrap s390's halt_poll_max_steal with READ_ONCE and snapshot the result of
kvm_arch_no_poll() in kvm_vcpu_block() to avoid a mostly-theoretical,
largely benign bug on s390 where the result of kvm_arch_no_poll() could
change due to userspace modifying halt_poll_max_steal while the vCPU is
blocking.  The bug is largely benign as it will either cause KVM to skip
updating halt-polling times (no_poll toggles false=>true) or to update
halt-polling times with a slightly flawed block_ns.

Note, READ_ONCE is unnecessary in the current code, add it in case the
arch hook is ever inlined, and to provide a hint that userspace can
change the param at will.

Fixes: 8b905d28ee17 ("KVM: s390: provide kvm_arch_no_poll function")
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoKVM: VMX: Don't unblock vCPU w/ Posted IRQ if IRQs are disabled in guest
Paolo Bonzini [Tue, 16 Nov 2021 14:32:47 +0000 (09:32 -0500)]
KVM: VMX: Don't unblock vCPU w/ Posted IRQ if IRQs are disabled in guest

[ Upstream commit 1831fa44df743a7cdffdf1c12c799bf6f3c12b8c ]

Don't configure the wakeup handler when a vCPU is blocking with IRQs
disabled, in which case any IRQ, posted or otherwise, should not be
recognized and thus should not wake the vCPU.

Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: aardvark: Fix checking for MEM resource type
Pali Rohár [Thu, 25 Nov 2021 16:01:47 +0000 (17:01 +0100)]
PCI: aardvark: Fix checking for MEM resource type

[ Upstream commit 2070b2ddea89f5b604fac3d27ade5cb6d19a5706 ]

IORESOURCE_MEM_64 is not a resource type but a type flag.

Remove incorrect check for type IORESOURCE_MEM_64.

Link: https://lore.kernel.org/r/20211125160148.26029-2-kabel@kernel.org
Fixes: 64f160e19e92 ("PCI: aardvark: Configure PCIe resources from 'ranges' DT property")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: dwc: Do not remap invalid res
Tim Harvey [Mon, 1 Nov 2021 18:02:43 +0000 (11:02 -0700)]
PCI: dwc: Do not remap invalid res

[ Upstream commit 6e5ebc96ec651b67131f816d7e3bf286c635e749 ]

On imx6 and perhaps others when pcie probes you get a:
imx6q-pcie 33800000.pcie: invalid resource

This occurs because the atu is not specified in the DT and as such it
should not be remapped.

Link: https://lore.kernel.org/r/20211101180243.23761-1-tharvey@gateworks.com
Fixes: 281f1f99cf3a ("PCI: dwc: Detect number of iATU windows")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
Cc: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: rcar: Check if device is runtime suspended instead of __clk_is_enabled()
Marek Vasut [Mon, 15 Nov 2021 20:46:41 +0000 (21:46 +0100)]
PCI: rcar: Check if device is runtime suspended instead of __clk_is_enabled()

[ Upstream commit d2a14b54989e9ccea8401895fdfbc213bd1f56af ]

Replace __clk_is_enabled() with pm_runtime_suspended(),
as __clk_is_enabled() was checking the wrong bus clock
and caused the following build error too:
  arm-linux-gnueabi-ld: drivers/pci/controller/pcie-rcar-host.o: in function `rcar_pcie_aarch32_abort_handler':
  pcie-rcar-host.c:(.text+0xdd0): undefined reference to `__clk_is_enabled'

Link: https://lore.kernel.org/r/20211115204641.12941-1-marek.vasut@gmail.com
Fixes: a115b1bd3af0 ("PCI: rcar: Add L1 link state fix into data abort hook")
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: linux-renesas-soc@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoPCI: mediatek-gen3: Disable DVFSRC voltage request
Jianjun Wang [Fri, 15 Oct 2021 06:36:02 +0000 (14:36 +0800)]
PCI: mediatek-gen3: Disable DVFSRC voltage request

[ Upstream commit ab344fd43f2958726d17d651c0cb692c67dca382 ]

When the DVFSRC (dynamic voltage and frequency scaling resource collector)
feature is not implemented, the PCIe hardware will assert a voltage request
signal when exit from the L1 PM Substates to request a specific Vcore
voltage, but cannot receive the voltage ready signal, which will cause
the link to fail to exit the L1 PM Substates.

Disable DVFSRC voltage request by default, we need to find a common way to
enable it in the future.

Link: https://lore.kernel.org/r/20211015063602.29058-1-jianjun.wang@mediatek.com
Fixes: d3bf75b579b9 ("PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192")
Tested-by: Qizhong Cheng <qizhong.cheng@mediatek.com>
Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Tzung-Bi Shih <tzungbi@google.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agosignal: In get_signal test for signal_group_exit every time through the loop
Eric W. Biederman [Mon, 15 Nov 2021 17:55:57 +0000 (11:55 -0600)]
signal: In get_signal test for signal_group_exit every time through the loop

[ Upstream commit e7f7c99ba911f56bc338845c1cd72954ba591707 ]

Recently while investigating a problem with rr and signals I noticed
that siglock is dropped in ptrace_signal and get_signal does not jump
to relock.

Looking farther to see if the problem is anywhere else I see that
do_signal_stop also returns if signal_group_exit is true.  I believe
that test can now never be true, but it is a bit hard to trace
through and be certain.

Testing signal_group_exit is not expensive, so move the test for
signal_group_exit into the for loop inside of get_signal to ensure
the test is never skipped improperly.

This has been a potential problem since I added the test for
signal_group_exit was added.

Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/875yssekcd.fsf_-_@email.froward.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoMIPS: fix local_{add,sub}_return on MIPS64
Huang Pei [Wed, 15 Dec 2021 08:44:57 +0000 (16:44 +0800)]
MIPS: fix local_{add,sub}_return on MIPS64

[ Upstream commit 277c8cb3e8ac199f075bf9576ad286687ed17173 ]

Use "daddu/dsubu" for long int on MIPS64 instead of "addu/subu"

Fixes: 7232311ef14c ("local_t: mips extension")
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agomtd: spi-nor: Fix mtd size for s3an flashes
Tudor Ambarus [Tue, 7 Dec 2021 14:02:41 +0000 (16:02 +0200)]
mtd: spi-nor: Fix mtd size for s3an flashes

[ Upstream commit f656b419d41aabafb6b526abc3988dfbf2e5c1ba ]

As it was before the blamed commit, s3an_nor_scan() was called
after mtd size was set with params->size, and it overwrote the mtd
size value with '8 * nor->page_size * nor->info->n_sectors' when
XSR_PAGESIZE was set. With the introduction of
s3an_post_sfdp_fixups(), we missed to update the mtd size for the
s3an flashes. Fix the mtd size by updating both nor->params->size,
(which will update the mtd_info size later on) and nor->mtd.size
(which is used in spi_nor_set_addr_width()).

Fixes: 641edddb4f43 ("mtd: spi-nor: Add s3an_post_sfdp_fixups()")
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Pratyush Yadav <p.yadav@ti.com>
Link: https://lore.kernel.org/r/20211207140254.87681-2-tudor.ambarus@microchip.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agotools/resolve_btf_ids: Close ELF file on error
Andrii Nakryiko [Wed, 24 Nov 2021 00:23:13 +0000 (16:23 -0800)]
tools/resolve_btf_ids: Close ELF file on error

[ Upstream commit 1144ab9bdf3430e1b5b3f22741e5283841951add ]

Fix one case where we don't do explicit clean up.

Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211124002325.1737739-2-andrii@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoio_uring: fix no lock protection for ctx->cq_extra
Hao Xu [Thu, 25 Nov 2021 09:21:02 +0000 (17:21 +0800)]
io_uring: fix no lock protection for ctx->cq_extra

[ Upstream commit e302f1046f4c209291b07ff7bc4d15ca26891f16 ]

ctx->cq_extra should be protected by completion lock so that the
req_need_defer() does the right check.

Cc: stable@vger.kernel.org
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211125092103.224502-2-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoNFSD: Fix zero-length NFSv3 WRITEs
Chuck Lever [Tue, 21 Dec 2021 16:52:06 +0000 (11:52 -0500)]
NFSD: Fix zero-length NFSv3 WRITEs

[ Upstream commit 6a2f774424bfdcc2df3e17de0cefe74a4269cad5 ]

The Linux NFS server currently responds to a zero-length NFSv3 WRITE
request with NFS3ERR_IO. It responds to a zero-length NFSv4 WRITE
with NFS4_OK and count of zero.

RFC 1813 says of the WRITE procedure's @count argument:

count
         The number of bytes of data to be written. If count is
         0, the WRITE will succeed and return a count of 0,
         barring errors due to permissions checking.

RFC 8881 has similar language for NFSv4, though NFSv4 removed the
explicit @count argument because that value is already contained in
the opaque payload array.

The synthetic client pynfs's WRT4 and WRT15 tests do emit zero-
length WRITEs to exercise this spec requirement. Commit fdec6114ee1f
("nfsd4: zero-length WRITE should succeed") addressed the same
problem there with the same fix.

But interestingly the Linux NFS client does not appear to emit zero-
length WRITEs, instead squelching them. I'm not aware of a test that
can generate such WRITEs for NFSv3, so I wrote a naive C program to
generate a zero-length WRITE and test this fix.

Fixes: 8154ef2776aa ("NFSD: Clean up legacy NFS WRITE argument XDR decoders")
Reported-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoNFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
Chuck Lever [Thu, 30 Sep 2021 21:06:21 +0000 (17:06 -0400)]
NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()

[ Upstream commit dae9a6cab8009e526570e7477ce858dcdfeb256e ]

Refactor.

Now that the NFSv2 and NFSv3 XDR decoders have been converted to
use xdr_streams, the WRITE decoder functions can use
xdr_stream_subsegment() to extract the WRITE payload into its own
xdr_buf, just as the NFSv4 WRITE XDR decoder currently does.

That makes it possible to pass the first kvec, pages array + length,
page_base, and total payload length via a single function parameter.

The payload's page_base is not yet assigned or used, but will be in
subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
Eric Dumazet [Thu, 3 Mar 2022 17:37:28 +0000 (09:37 -0800)]
ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()

[ Upstream commit 2d3916f3189172d5c69d33065c3c21119fe539fc ]

While investigating on why a synchronize_net() has been added recently
in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
might drop skbs in some cases.

Discussion about removing synchronize_net() from ipv6_mc_down()
will happen in a different thread.

Fixes: f185de28d9ae ("mld: add new workqueues for process mld events")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agotracing: Add ustring operation to filtering string pointers
Steven Rostedt [Fri, 14 Jan 2022 01:08:40 +0000 (20:08 -0500)]
tracing: Add ustring operation to filtering string pointers

[ Upstream commit f37c3bbc635994eda203a6da4ba0f9d05165a8d6 ]

Since referencing user space pointers is special, if the user wants to
filter on a field that is a pointer to user space, then they need to
specify it.

Add a ".ustring" attribute to the field name for filters to state that the
field is pointing to user space such that the kernel can take the
appropriate action to read that pointer.

Link: https://lore.kernel.org/all/yt9d8rvmt2jq.fsf@linux.ibm.com/
Fixes: 77360f9bbc7e ("tracing: Add test for user space strings when filtering on string pointers")
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agodrm/amdgpu: check vm ready by amdgpu_vm->evicting flag
Qiang Yu [Mon, 21 Feb 2022 09:53:56 +0000 (17:53 +0800)]
drm/amdgpu: check vm ready by amdgpu_vm->evicting flag

[ Upstream commit c1a66c3bc425ff93774fb2f6eefa67b83170dd7e ]

Workstation application ANSA/META v21.1.4 get this error dmesg when
running CI test suite provided by ANSA/META:
[drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)

This is caused by:
1. create a 256MB buffer in invisible VRAM
2. CPU map the buffer and access it causes vm_fault and try to move
   it to visible VRAM
3. force visible VRAM space and traverse all VRAM bos to check if
   evicting this bo is valuable
4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
   will set amdgpu_vm->evicting, but latter due to not in visible
   VRAM, won't really evict it so not add it to amdgpu_vm->evicted
5. before next CS to clear the amdgpu_vm->evicting, user VM ops
   ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
   but fail in amdgpu_vm_bo_update_mapping() (check
   amdgpu_vm->evicting) and get this error log

This error won't affect functionality as next CS will finish the
waiting VM ops. But we'd better clear the error log by checking
the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
amdgpu_vm_bo_update_mapping() later.

Another reason is amdgpu_vm->evicted list holds all BOs (both
user buffer and page table), but only page table BOs' eviction
prevent VM ops. amdgpu_vm->evicting flag is set only for page
table BOs, so we should use evicting flag instead of evicted list
in amdgpu_vm_ready().

The side effect of this change is: previously blocked VM op (user
buffer in "evicted" list but no page table in it) gets done
immediately.

v2: update commit comments.

Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Qiang Yu <qiang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoata: pata_hpt37x: fix PCI clock detection
Sergey Shtylyov [Sat, 19 Feb 2022 20:04:29 +0000 (23:04 +0300)]
ata: pata_hpt37x: fix PCI clock detection

[ Upstream commit 5f6b0f2d037c8864f20ff15311c695f65eb09db5 ]

The f_CNT register (at the PCI config. address 0x78) is 16-bit, not
8-bit! The bug was there from the very start... :-(

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Fixes: 669a5db411d8 ("[libata] Add a bunch of PATA drivers.")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agosched/fair: Fix fault in reweight_entity
Tadeusz Struk [Thu, 3 Feb 2022 16:18:46 +0000 (08:18 -0800)]
sched/fair: Fix fault in reweight_entity

[ Upstream commit 13765de8148f71fa795e0a6607de37c49ea5915a ]

Syzbot found a GPF in reweight_entity. This has been bisected to
commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid
sched_task_group")

There is a race between sched_post_fork() and setpriority(PRIO_PGRP)
within a thread group that causes a null-ptr-deref in
reweight_entity() in CFS. The scenario is that the main process spawns
number of new threads, which then call setpriority(PRIO_PGRP, 0, -20),
wait, and exit.  For each of the new threads the copy_process() gets
invoked, which adds the new task_struct and calls sched_post_fork()
for it.

In the above scenario there is a possibility that
setpriority(PRIO_PGRP) and set_one_prio() will be called for a thread
in the group that is just being created by copy_process(), and for
which the sched_post_fork() has not been executed yet. This will
trigger a null pointer dereference in reweight_entity(), as it will
try to access the run queue pointer, which hasn't been set.

Before the mentioned change the cfs_rq pointer for the task  has been
set in sched_fork(), which is called much earlier in copy_process(),
before the new task is added to the thread_group.  Now it is done in
the sched_post_fork(), which is called after that.  To fix the issue
the remove the update_load param from the update_load param() function
and call reweight_task() only if the task flag doesn't have the
TASK_NEW flag set.

Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: syzbot+af7a719bc92395ee41b3@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220203161846.1160750-1-tadeusz.struk@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoext4: fast commit may miss file actions
Xin Yin [Mon, 17 Jan 2022 09:36:55 +0000 (17:36 +0800)]
ext4: fast commit may miss file actions

[ Upstream commit bdc8a53a6f2f0b1cb5f991440f2100732299eb93 ]

in the follow scenario:
1. jbd start transaction n
2. task A get new handle for transaction n+1
3. task A do some actions and add inode to FC_Q_MAIN fc_q
4. jbd complete transaction n and clear FC_Q_MAIN fc_q
5. task A call fsync

Fast commit will lost the file actions during a full commit.

we should also add updates to staging queue during a full commit.
and in ext4_fc_cleanup(), when reset a inode's fc track range, check
it's i_sync_tid, if it bigger than current transaction tid, do not
rest it, or we will lost the track range.

And EXT4_MF_FC_COMMITTING is not needed anymore, so drop it.

Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Link: https://lore.kernel.org/r/20220117093655.35160-3-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoext4: fast commit may not fallback for ineligible commit
Xin Yin [Mon, 17 Jan 2022 09:36:54 +0000 (17:36 +0800)]
ext4: fast commit may not fallback for ineligible commit

[ Upstream commit e85c81ba8859a4c839bcd69c5d83b32954133a5b ]

For the follow scenario:
1. jbd start commit transaction n
2. task A get new handle for transaction n+1
3. task A do some ineligible actions and mark FC_INELIGIBLE
4. jbd complete transaction n and clean FC_INELIGIBLE
5. task A call fsync

In this case fast commit will not fallback to full commit and
transaction n+1 also not handled by jbd.

Make ext4_fc_mark_ineligible() also record transaction tid for
latest ineligible case, when call ext4_fc_cleanup() check
current transaction tid, if small than latest ineligible tid
do not clear the EXT4_MF_FC_INELIGIBLE.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Suggested-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoext4: simplify updating of fast commit stats
Harshad Shirwadkar [Thu, 23 Dec 2021 20:21:39 +0000 (12:21 -0800)]
ext4: simplify updating of fast commit stats

[ Upstream commit 0915e464cb274648e1ef1663e1356e53ff400983 ]

Move fast commit stats updating logic to a separate function from
ext4_fc_commit(). This significantly improves readability of
ext4_fc_commit().

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211223202140.2061101-4-harshads@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoext4: drop ineligible txn start stop APIs
Harshad Shirwadkar [Thu, 23 Dec 2021 20:21:38 +0000 (12:21 -0800)]
ext4: drop ineligible txn start stop APIs

[ Upstream commit 7bbbe241ec7ce0def9f71464c878fdbd2b0dcf37 ]

This patch drops ext4_fc_start_ineligible() and
ext4_fc_stop_ineligible() APIs. Fast commit ineligible transactions
should simply call ext4_fc_mark_ineligible() after starting the
trasaction.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211223202140.2061101-3-harshads@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoserial: stm32: prevent TDR register overwrite when sending x_char
Valentin Caron [Tue, 11 Jan 2022 16:44:40 +0000 (17:44 +0100)]
serial: stm32: prevent TDR register overwrite when sending x_char

[ Upstream commit d3d079bde07e1b7deaeb57506dc0b86010121d17 ]

When sending x_char in stm32_usart_transmit_chars(), driver can overwrite
the value of TDR register by the value of x_char. If this happens, the
previous value that was present in TDR register will not be sent through
uart.

This code checks if the previous value in TDR register is sent before
writing the x_char value into register.

Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Valentin Caron <valentin.caron@foss.st.com>
Link: https://lore.kernel.org/r/20220111164441.6178-2-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agoarm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL
Masami Hiramatsu [Mon, 24 Jan 2022 08:17:54 +0000 (17:17 +0900)]
arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL

[ Upstream commit 1e0924bd09916fab795fc2a21ec1d148f24299fd ]

Mark the start_backtrace() as notrace and NOKPROBE_SYMBOL
because this function is called from ftrace and lockdep to
get the caller address via return_address(). The lockdep
is used in kprobes, it should also be NOKPROBE_SYMBOL.

Fixes: b07f3499661c ("arm64: stacktrace: Move start_backtrace() out of the header")
Cc: <stable@vger.kernel.org> # 5.13.x
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/164301227374.1433152.12808232644267107415.stgit@devnote2
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2 years agotracing: Add test for user space strings when filtering on string pointers
Steven Rostedt [Mon, 10 Jan 2022 16:55:32 +0000 (11:55 -0500)]
tracing: Add test for user space strings when filtering on string pointers

[ Upstream commit 77360f9bbc7e5e2ab7a2c8b4c0244fbbfcfc6f62 ]

Pingfan reported that the following causes a fault:

  echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter
  echo 1 > events/syscalls/sys_enter_at/enable

The reason is that trace event filter treats the user space pointer
defined by "filename" as a normal pointer to compare against the "cpu"
string. The following bug happened:

 kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0001) - permissions violation
 PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867
 Oops: 0001 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:strlen+0x0/0x20
 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11
       48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8
       48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
 RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246
 RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000
 RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60
 RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000
 R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380
 R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0
 FS:  00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  filter_pred_pchar+0x18/0x40
  filter_match_preds+0x31/0x70
  ftrace_syscall_enter+0x27a/0x2c0
  syscall_trace_enter.constprop.0+0x1aa/0x1d0
  do_syscall_64+0x16/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fa861d88664

The above happened because the kernel tried to access user space directly
and triggered a "supervisor read access in kernel mode" fault. Worse yet,
the memory could not even be loaded yet, and a SEGFAULT could happen as
well. This could be true for kernel space accessing as well.

To be even more robust, test both kernel and user space strings. If the
string fails to read, then simply have the filter fail.

Note, TASK_SIZE is used to determine if the pointer is user or kernel space
and the appropriate strncpy_from_kernel/user_nofault() function is used to
copy the memory. For some architectures, the compare to TASK_SIZE may always
pick user space or kernel space. If it gets it wrong, the only thing is that
the filter will fail to match. In the future, this needs to be fixed to have
the event denote which should be used. But failing a filter is much better
than panicing the machine, and that can be solved later.

Link: https://lore.kernel.org/all/20220107044951.22080-1-kernelfans@gmail.com/
Link: https://lkml.kernel.org/r/20220110115532.536088fd@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Pingfan Liu <kernelfans@gmail.com>
Tested-by: Pingfan Liu <kernelfans@gmail.com>
Fixes: 87a342f5db69d ("tracing/filters: Support filtering for char * strings")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>